Parag Mali - tag: identity-protection

Microsoft Defender for Identity: The Defensive AD Stack That Sees What BloodHound Maps

noreply@paragmali.com (Parag Mali) — Wed, 27 May 2026 00:00:00 GMT

**Microsoft Defender for Identity (MDI) is the cloud-backed, on-DC defensive sensor that watches for almost every offensive Active Directory primitive in the SpecterOps / Mimikatz / Certipy corpus** -- DCSync, DCShadow, Golden / Silver / Diamond ticket forgery, Kerberoasting, AS-REP roasting, NTLM relay, and AD CS abuse -- by parsing Kerberos, NTLM, LDAP, and DRSUAPI on the wire and running per-principal behavioural baselines in a multi-tenant cloud backend. The product began as the Israeli startup Aorato (acquired by Microsoft in November 2014), shipped on-prem as Microsoft ATA in 2015, moved to the cloud as Azure ATP in 2018, was renamed to MDI in 2020, folded into Microsoft Defender XDR at Ignite 2023, and reached its current MDE-integrated v3.x sensor in October 2025. The alert catalogue maps cleanly onto MITRE ATT&CK, and the residual blind spots are knowable: the Credential Guard wall, the Sapphire Ticket's cryptographic indistinguishability, the encrypted-channel DCSync class, the cross-forest under-instrumentation tail, and legitimate-principal compromise. The operator question in 2026 is not whether MDI detects the attack, but whether the sensor is deployed, the alert was triaged inside the batched-emission window, and the residuals are covered by KQL, Sigma rules, or out-of-band controls.

1. A Friday Afternoon at the Domain Controller

Friday, 14:33. A red-team contractor in conference room C runs Rubeus.exe asreproast on a corporate laptop she was issued an hour ago. A junior auditor on the fourth floor, working from a desk with read-only Active Directory access, runs bloodhound-python -c All for a routine quarterly review. A quiet service account on the SQL host in rack 14 runs mimikatz "lsadump::dcsync /domain:contoso.com /user:Administrator". The operator at the other end of that session is not on the payroll. Three different workstations. Three different intents. One domain controller on the receiving end of all three.

The Security Operations Center has not noticed any of them yet. The watcher on the domain controller, however, has. By 14:35 three named alerts are sitting in the Defender XDR queue, each tagged with a MITRE ATT&CK technique ID, each waiting for someone to triage. Suspected AS-REP Roasting attack (T1558.004) for the Rubeus invocation [@mslearn-mdi-alerts-xdr]. Security principal reconnaissance (LDAP) for the BloodHound enumeration [@mslearn-mdi-alerts-mdi-classic]. Suspected DCSync attack -- replication of directory services, External ID 2006, T1003.006, for the Mimikatz call [@mslearn-mdi-alerts-mdi-classic][@mitre-t1003-006]. The watcher is Microsoft Defender for Identity.SOC operators inside Microsoft customers describe this with a stock phrase: "the watcher was already on the DC." The phrase shows up in incident-response runbooks, vendor training decks, and the Microsoft Defender for Identity Tech Community archive. It captures what is, architecturally, a strange thing -- the defender's sensor is co-located with the attacker's target, not perched outside it.

A Windows Server hosting the Active Directory Domain Services role, responsible for processing Kerberos authentication, NTLM challenges, LDAP queries, and inter-DC directory replication (DRSUAPI) for a domain. Every named MDI runtime alert in this article fires on signal that originates on or transits a domain controller; the deployment model assumes one MDI sensor per DC, plus optional sensors on AD FS, AD CS, and Microsoft Entra Connect servers when those identity roles run on dedicated hosts.

Almost every offensive AD primitive a reader of the SpecterOps, Mimikatz, and Certipy corpus already knows has a runtime alert or a posture assessment shipped by Microsoft on that same DC. Almost is the load-bearing word. The alert fires only if three things are true: the sensor is deployed on the surface the attack touches, the audit subcategory the alert depends on is enabled, and the SOC opens the Defender XDR incident inside the batched-emission window the cloud backend uses to aggregate signal. This article is about all three conditions, the twelve-year arc that built the watcher, and the structural blind spots no future MDI release will close.

The watcher was not always on the domain controller. For the first decade of Active Directory, nothing on the DC saw what BloodHound today maps. To understand where the watcher came from -- and why its blind spots look the way they do -- we have to start with three founders in Herzliya and a Kerberos forgery presentation in Las Vegas.

2. Origins -- Aorato, the Israeli Startup That Became the Watcher

August 2014, Black Hat USA. Tal Be'ery and Michael Cherny take the stage with Alva Duckwall and Benjamin Delpy to present "Abusing Microsoft Kerberos: Sorry You Guys Don't Get It," a demonstration that a stolen krbtgt key lets an attacker mint Kerberos ticket-granting tickets that survive every password rotation in the standard remediation playbook [@blackhat-us14-briefings]. The audience is Active Directory operators who thought their password-reset runbook covered them. By the end of the talk it does not. The startup behind the research is Aorato, three years old, headquartered in Herzliya, Israel. Three months later, Microsoft buys it.

The credential a Kerberos client receives from the Key Distribution Center (the KDC, which on a Windows network runs on every DC) after successful pre-authentication. The TGT is encrypted with the KDC's own long-term key -- on Active Directory, the password hash of the `krbtgt` account. Possession of the `krbtgt` hash therefore lets an attacker forge a valid TGT for any principal in the domain, since the KDC has no other way to distinguish a forged ticket from a real one. This forged-ticket class is what MITRE catalogues as T1558.001 Golden Ticket [@mitre-t1558-001].

The Aorato deal closed on November 13, 2014, announced on the Microsoft Official Blog by Takeshi Numoto, then Corporate Vice President of Cloud and Enterprise Marketing [@msblog-aorato]. The post named the central technology Microsoft was acquiring: Aorato's Organizational Security Graph, described as "a living, continuously-updated view of all of the people and machines accessing an organization's Windows Server Active Directory." Pre-acquisition Microsoft had Azure AD on the cloud side and per-DC event log auditing on the on-prem side, but no first-party behavioural-analytics product over Active Directory. Aorato's pre-acquisition product, the Directory Services Application Firewall, did exactly that -- it parsed Kerberos, NTLM, LDAP, and DRSUAPI on the wire and ran per-principal behavioural baselines against the parsed protocol stream. Microsoft wanted that capability inside Windows Server, and inside Office 365.Aorato's three founders, per the Globes coverage of the acquisition in November 2014, were Idan Plotnik (CEO), Michael Dolinsky (VP R&D), and Ohad Plotnik (VP professional services). Tal Be'ery was VP of Research. A popular reading of the deal names "the Plotnik brothers and Tal Be'ery" as the co-founder trio, which compresses out Dolinsky's role -- the contemporaneous record names four people, not three [@globes-aorato-2014].

The product lineage that follows is twelve years long and runs through five names. Microsoft Advanced Threat Analytics (ATA) was announced as generally available on August 27, 2015 (build 1.4.2457, dated August 31, 2015) -- the on-prem productisation of Aorato's wire-side parser, packaged as a SPAN-mirror appliance ("ATA Gateway") plus an on-prem analytics server ("ATA Center") with its own MongoDB-style document store [@mstc-ata-ga][@atadocs-versions]. Azure ATP went GA on March 1, 2018 -- the cloud-side rewrite that kept the on-DC sensor but moved the analytics engine to a multi-tenant cloud backend [@mstc-azureatp-ga][@mstc-azureatp-intro]. Microsoft Defender for Identity was the September 22, 2020 rename announced at Ignite 2020, part of Microsoft's broader brand consolidation that also rebranded Office 365 ATP to Microsoft Defender for Office 365 and Microsoft Defender ATP to Microsoft Defender for Endpoint [@mssecblog-unified-xdr][@itpro-defender-rebrand][@infusedinnov-names]. The November 2023 Ignite keynote consolidated Microsoft 365 Defender into Microsoft Defender XDR [@virtreview-ignite2023][@handsontek-defender-rebrand]. In October 2025 the v3.x sensor GA folded MDI's on-DC sensor into the Microsoft Defender for Endpoint agent that organisations were already running on every server [@mslearn-mdi-whats-new][@modernsec-v3x][@jeffreyappel-v2v3]. The May 2026 release notes extended the v3.x sensor to cover AD FS, AD CS, and Microsoft Entra Connect identity roles directly when those roles run on a domain controller, and raised the per-workspace sensor cap from 350 to 1,000 [@mslearn-mdi-whats-new].

gantt title Microsoft Defender for Identity lineage, 2012-2026 dateFormat YYYY-MM-DD axisFormat %Y section Aorato Aorato startup (DSAF product) :a1, 2012-01-01, 2014-11-13 section Microsoft ATA ATA initial release SPAN-mirror Gateway :a2, 2015-08-27, 2016-05-01 ATA 1.6-1.9 Lightweight Gateway :a3, 2016-05-01, 2018-03-01 ATA Extended Support window :a4, 2018-03-01, 2026-01-31 section Cloud rewrite Azure ATP GA :a5, 2018-03-01, 2020-09-22 Microsoft Defender for Identity name :a6, 2020-09-22, 2023-11-15 section Defender XDR era MDI inside Defender XDR (v2.x) :a7, 2023-11-15, 2025-10-01 MDI v3.x MDE-integrated sensor :a8, 2025-10-01, 2026-05-27

Aorato's pitch in 2014 was that the Windows Security event log -- the thing every SIEM in the world was ingesting -- could not see the attacks an Active Directory operator most needed to catch. To believe that pitch you have to know exactly what the event log misses.

3. Why the Event Log Could Not See Golden Tickets

Present a Golden Ticket to a domain controller, and the LSA writes a successful event 4769 -- a Kerberos service ticket request. Present a legitimate ticket from the same principal, and the LSA writes a successful event 4769. Nothing in the event log's schema, anywhere in any field, distinguishes the two. The ticket is forged with the real krbtgt key, so the KDC's signature checks pass. The event log records that an authentication happened, not whether the ticket presented was genuine. This is the structural ceiling the SIEM industry could not work around for the first decade of its existence, and it is the gap Aorato was built to close [@mitre-t1558-001][@semperis-golden-ticket].

The bare-event-log model has three structural failure modes, each of which drove a generation of detection engineering. Forged-ticket invisibility is the first: the LSA logs that an auth happened, but every byte in the 4769 event matches the legitimate case. Per-DC silo is the second: a Kerberos auth against one DC and a follow-up auth against another DC five seconds later sit in two different Security.evtx files, on two different machines, with no aggregation layer to ask "did the same principal hit ten DCs in five minutes?" Manual-review throughput collapse is the third: a medium-sized forest emits thousands of 4624, 4768, 4769 events per minute per DC, and the human analyst hand-walking them never catches up.

DCSync makes the first two failure modes vivid. Sean Metcalf's September 2015 ADSecurity writeup walks through running lsadump::dcsync /domain:contoso.com /user:Administrator from a workstation: the DC handles the DRSUAPI replication request, the LSA emits a 4662 event for the directory-service-object access, and the attacker walks away with the password hash [@adsec-dcsync].Metcalf's companion DerbyCon V talk, Red vs. Blue: Modern Active Directory Attacks & Defense (September 2015), is the canonical operator-grade introduction to the same material [@adsec-dump-ad]. The 4662 event is structurally indistinguishable from a legitimate replication request between two DCs. A SIEM rule that flagged 4662 events whose source IP was not a DC could catch it -- but only if the analyst maintained the IP allowlist (a single Microsoft Entra Connect server in the wrong subnet broke the rule), and only if 4662 was enabled at all (it was high-volume, and many SOCs disabled it to stay under the SIEM's GB/day licence).

Key idea: The SIEM was not failing at Active Directory detection because the rules were wrong. It was failing because the event log -- the data source every SIEM relied on -- could not see what the SIEM needed it to see. Better rules over the same event log would not have closed the gap. Aorato's contribution was to find a different data source: the wire itself.

Aorato's three primitives, none of which the SIEM-plus-event-log model had, were: per-principal behavioural baselines so that a long-tail anomaly stood out without anybody writing a rule for it; on-DC network capture so that the ticket structure, the DRSUAPI opnum, and the LDAP search filter were available to detection logic; and a graph over the directory so that the path from compromised workstation to crown-jewel asset could be computed rather than inferred. ATA shipped the first two in 2015. The graph took longer.

4. Early Approaches -- ATA 1.x and the Generations That Tried Before

By the time Aorato shipped its first product, four prior generations of Active Directory detection had already tried and stalled. Each one could see something the previous generation could not. Each one had a structural ceiling an attacker primitive eventually pushed through. The seven generations that follow are the real spine of the article.

flowchart LR G1["Gen 1: bare per-DC
event log audit"] --> G2["Gen 2: SIEM-centralised
events with static rules"] G2 --> G3["Gen 3: first-generation UEBA
over SIEM events"] G3 --> G4["Gen 4: Aorato DSAF and
ATA 1.4-1.5 (SPAN mirror)"] G4 --> G5["Gen 5: ATA 1.6-1.9
(Lightweight Gateway + LMP)"] G5 --> G6["Gen 6: Azure ATP, MDI v1.x-v2.x
(cloud analytics)"] G6 --> G7["Gen 7: MDI v3.x
(MDE-integrated + Identity Explorer)"]

Generation 1 -- bare per-DC event log auditing (1999-2008) was already covered above. It was the only model that existed for the first decade of Active Directory, and its structural ceilings became Aorato's pitch deck.

Generation 2 -- SIEM-centralised event log ingestion with static correlation rules (2005-2014) is the era of ArcSight, Splunk, QRadar, and LogRhythm. Windows Event Forwarder agents on every DC streamed Security event log entries into a central index, and SOC operators wrote rule-based correlation searches in the vendor's query language. The model gave the SOC cross-DC correlation, a query language, and an audit trail that satisfied PCI-DSS Requirement 10. It did not give the SOC anything new about the data the LSA emitted. Mimikatz's lsadump::dcsync was committed to the public Mimikatz repository in March 2015 [@mimikatz-github][@adsec-dcsync]. Sean Metcalf's longer ADSecurity writeup of the technique followed in September 2015. At commit time, every SIEM in production was correlating DC event logs and not one was emitting a DCSync alert, because the 4662 event was structurally identical to a legitimate DC-to-DC replication.

Generation 3 -- first-generation UEBA on SIEM event data (2013-2017) was Securonix, Exabeam, and Splunk UBA. Per-principal behavioural baselines layered on top of the SIEM event index could catch novel TTPs without prior signatures -- a Kerberoasting variant whose SPN list had never been seen before could still trip "this account is requesting an unusual number of service tickets compared to its baseline." UEBA also closed Generation 2's per-principal context gap. It did not, however, see ticket structure: a Golden Ticket replayed against ten DCs produces ten successful auths that are behaviourally indistinguishable from the legitimate Domain Admin's pattern unless the attacker's source IP or geographic distribution breaks the baseline. This is the legitimate-principal-compromise non-detection class that survives every defensive generation into 2026.

Generation 4 -- on-wire protocol analytics via off-DC SPAN-mirror Gateway is where Aorato's product, and then Microsoft ATA 1.4 and 1.5, lived. A switch SPAN port mirrored DC traffic to a dedicated ATA Gateway appliance, which ran libpcap-equivalent capture and parsed Kerberos AS-REQ / TGS-REQ / AP-REQ, NTLM challenges, LDAP searches, and DRSUAPI replication calls. Parsed events streamed to the on-prem ATA Center, which ran detection logic and surfaced alerts in a web console [@mstc-ata-ga]. The wire-side parse closed Generation 1-3's biggest blind spot: ticket structure was finally visible. The SPAN-port operational tax killed the architecture in nine months. Many enterprises could not provision a SPAN mirror. Virtualised DCs on shared hypervisors had no equivalent of a physical SPAN. And the security review of "all DC traffic now mirrors to this appliance" was non-trivial.

Generation 5 -- ATA 1.6 Lightweight Gateway through ATA 1.9 (May 2016 to March 2020) moved the Gateway in-process onto the DC itself. ATA 1.6 (May 2016) introduced the Lightweight Gateway with dynamic resource management that capped the sensor's CPU and memory footprint and let the sensor consume events locally rather than via mirrored network traffic [@mslearn-ata-1-6]. ATA 1.7 (August 31, 2016) added Role-Based Access Control for the ATA Console, Windows Server Core support, and detection of reconnaissance through directory-services enumeration [@mssupport-ata-1-7][@atadocs-versions]. ATA 1.8 (June 30, 2017; announced July 26, 2017) shipped behavioural-brute-force detection, a Golden Ticket lifetime detector, and the abnormal-modification-of-sensitive-groups alert [@mslearn-ata-1-8][@mstc-ata-1-8][@ataversions-1-8-availability]. ATA 1.9 (March 21, 2018) shipped both the entity-profile lateral-movement-aware view and the Lateral movement paths to sensitive accounts report [@mslearn-ata-1-9][@atadocs-versions][@atadocs-lmp-usecase].A widespread reading of the ATA timeline anchors LMP to ATA 1.7 in late 2017. The primary record contradicts this on both date and feature: ATA 1.7 shipped on August 31, 2016 per the Microsoft Support KB and the ATA-versions table, and the 1.7 release notes do not mention Lateral Movement Paths. Neither do the 1.8 release notes -- LMP first appears in ATA 1.9 (March 21, 2018), which introduced both the entity-profile lateral-movement view and the full Lateral movement paths to sensitive accounts report in the same release [@mssupport-ata-1-7][@atadocs-versions][@mslearn-ata-1-8][@mslearn-ata-1-9].

Note: A popular framing of the LMP timeline says "Microsoft adopted BloodHound-style graph attack paths in 2022." The primary sources contradict this. Graph-anchored attack-path evaluation in Microsoft's defensive stack originates in ATA 1.9 (March 2018), not in any 2022 adoption event. What did happen in 2022 was the start of the deprecation arc for the SAM-R-based discovery the LMP graph depended on, which culminated in Message Center notice MC1073068 in May 2025 when Microsoft disabled SAM-R-based local-administrators collection across MDI tenants [@handsontek-mc1073068]. The 2022 date that lingers in operator memory is the deprecation anchor, not the adoption anchor.

An attack chain through Active Directory in which a non-sensitive account whose credentials are exposed on one workstation can be used to authenticate to a second workstation where a sensitive account's credentials are cached, which in turn can be used to reach a third workstation, and so on until a Domain Admin or comparable target is reached. ATA 1.9's Lateral Movement Paths report was the first graph-anchored defensive surface that computed the chain in advance; the report was populated by SAM-R queries that enumerated each host's local-administrators group. Microsoft disabled the SAM-R-based collection in May 2025 (MC1073068), and the post-LMP graph layer migrated to the Defender XDR hunting graph plus the April 2026 Identity Explorer preview.

The limitation that drove Generation 5 into Generation 6 was the on-prem ATA Center's release cadence. Benjamin Delpy and Vincent Le Toux disclosed DCShadow at BlueHat IL 2018 in January 2018 -- the technique of registering a rogue domain controller via nTDSDSA object creation plus SPN registration, then pushing arbitrary updates into AD via legitimate DRSUAPI replication that the event log records as ordinary inter-DC traffic [@dcshadow-com][@mitre-t1207]. ATA 1.9 shipped two months later, in March 2018, with no DCShadow detection. Azure ATP -- the cloud-side rewrite, also GA in March 2018 -- shipped paired alerts External ID 2028 (Suspected DCShadow attack -- domain controller promotion) and External ID 2029 (Suspected DCShadow attack -- domain controller replication request) five months later, in July 2018 [@mslearn-mdi-alerts-mdi-classic][@mslearn-mdi-whats-new-archive]. The on-prem release cadence could not have closed that five-month gap. The cloud rewrite was the structural answer.

5. The Breakthrough -- Azure ATP and the Inverted Data Path

If the wire was the right data layer, the cloud was the right place to run the analytics. That is the architectural decision Azure ATP committed to in March 2018, and it is what distinguishes the Microsoft defensive product from every prior generation. The on-DC sensor stayed on the DC. The analytics engine moved.

Four architectural shifts followed. First, the on-DC sensor became a thin parser. Sensors no longer hosted detection logic; they captured the Kerberos / NTLM / LDAP / DRSUAPI traffic, parsed it into a stream of structured events, and shipped the stream upstream. Second, the data path inverted. Generation 4 sent unparsed packets from the wire to the off-DC Gateway, which parsed them and stored them on-prem; Azure ATP sent parsed events from the on-DC sensor upstream to a multi-tenant cloud backend that ran detection logic and wrote alerts back into a tenant-specific workspace. Third, per-principal behavioural baselines accumulated centrally rather than per-DC, so a baseline survived DC reboots, sensor restarts, and migrations across data centres. Fourth, identity signal joined endpoint and email signal in the same incident queue once Azure ATP folded into Microsoft 365 Defender -- the cross-product correlation that no on-prem product had ever offered [@mstc-azureatp-ga][@mstc-azureatp-intro][@mslearn-xdr-overview].

Then came the brand-and-architecture history every operator has to know to read a 2026 runbook. The September 22, 2020 rename from Azure Advanced Threat Protection to Microsoft Defender for Identity was a brand consolidation, not an architecture change -- the same sensor, the same alerts, the same workspace [@mssecblog-unified-xdr]. The legacy portal.atp.azure.com standalone portal was retired on June 30, 2023 via Message Center notice MC567494, with all requests automatically redirected to security.microsoft.com [@handsontek-mc567494][@mslearn-mdi-portal]. The November 15, 2023 Ignite keynote renamed Microsoft 365 Defender to Microsoft Defender XDR (Message Center MC696570) [@handsontek-defender-rebrand][@virtreview-ignite2023]. Again a brand change, again not an architecture change: the sensors stayed on the DC, the analytics stayed in the cloud, and the KQL schema -- IdentityLogonEvents, IdentityQueryEvents, IdentityDirectoryEvents -- stayed the same [@mslearn-xdr-identitylogon][@mslearn-xdr-identityquery][@mslearn-xdr-identitydirectory].The legacy portal.atp.azure.com URL is worth remembering because runbooks and SOAR rules from 2018 to 2023 frequently hard-coded it. Any rule that referenced the old portal needs an update; the redirect handles browser traffic but not API calls.

What the sensor actually feeds into the cloud backend, in 2026, is four data-input layers, ordered roughly by evidence strength. First, the Windows Security event log -- the audit subcategories that the MDI event-collection page lists as required, including Audit Credential Validation, Audit Kerberos Authentication Service, Audit Kerberos Service Ticket Operations, Audit Directory Service Access, and Audit Computer Account Management among others [@mslearn-mdi-event-collection]. These are public, documented, and easy to verify with auditpol /get /category:*. Second, on-DC network capture of Kerberos, NTLM, LDAP, and DRSUAPI -- well-documented because the sensor's network requirements are part of the public deployment guide. Third, Event Tracing for Windows providers that the sensor subscribes to in order to get signal the event log does not surface. Fourth, AD CS audit-log subscriptions added with the AD CS sensor release in August 2023 [@mstc-adcs-sensor][@dirteam-sander-aug2023].

Microsoft has never published the canonical list of Event Tracing for Windows providers that the MDI sensor subscribes to. Any specific list of providers a reader encounters traces back to community reverse-engineering: Synacktiv's *A primer on Microsoft Defender for Identity* by Guillaume Andre and Mickael Benassouli (November 2022) is the canonical operator-research primary [@synacktiv-primer-mdi][@synacktiv-primer-mdi-archive]. The methodological precedent is Olaf Hartong's *Microsoft Defender for Endpoint Internals* series, specifically the 0x02 entry on audit settings and telemetry, which documents the binary-side enumeration approach: run Matt Graeber's Get-TraceLoggingMetadata script against the sensor executable to enumerate the providers it registers, then use Sealighter to trace those providers to a file for further analysis [@falconforce-mde-0x02][@gist-tracelogging-metadata][@github-sealighter]. Hartong's 0x02 article reports "roughly 111 public and MDE-exclusive providers used" by MsSense.exe -- the MDI sensor binary is amenable to the same technique, and the provider mix differs (MDI subscribes heavily to LDAP, Kerberos, DRSUAPI, and SAM-R-class providers; MDE subscribes heavily to process, file, network, and image-load providers) but the methodology is shared [@falconforce-mde-0x03][@github-olafhartong]. Read any community-published MDI provider list as a snapshot of what the community has reverse-engineered, not as Microsoft-published ground truth. The breakthrough was not better detection algorithms. The breakthrough was moving the analytics off the DC entirely, so the per-principal baselines could accumulate centrally and the detection set could ship on a cloud cadence instead of an on-prem one. That decision is why MDI shipped DCShadow detection within five months of disclosure -- a cadence the on-prem product could not have matched.

That is the move that turned a wire-side parse into a sustained detection program. The proof is the DCShadow timeline: five months from disclosure to detection, on a cadence the on-prem product could not have matched. Now we can ask the question every reader of the offensive-AD corpus actually wants answered. What does the watcher catch in 2026?

6. MDI in 2026 -- Sensors, Alerts, KQL, and the Graph in Transition

This is the article's bookmarking section. Four parts: what is on the DC, what alerts fire, what KQL the operator writes when the alerts miss, and where the graph layer that began as ATA 1.9's Lateral Movement Paths report actually lives in 2026.

6.1 Sensor topology in 2026

What is on a Windows Server 2022 (or 2025) domain controller running MDI in May 2026? Two sensor families, two target-server matrices, and a workspace cap.

The v2.x sensor is the legacy standalone agent: supported on Windows Server 2016 and earlier domain controllers, and on AD FS, AD CS issuing certificate authorities, and Microsoft Entra Connect servers that are not themselves domain controllers, per the v2.x prerequisites page [@mslearn-mdi-prereq-sensor-v2]. v2.x carries its own installer, its own update cadence, and its own packet capture stack (NPCap). It also requires a Directory Service Account (DSA) -- a gMSA configured during install whose forest-wide read rights let the sensor enumerate AD objects.

A group Managed Service Account configured during MDI v2.x sensor installation, granted forest-wide read permissions on Active Directory objects so the sensor can resolve principal identities, enumerate group memberships, and read schema attributes that the wire-side parse refers to by SID. The v3.x sensor replaces the DSA pattern with LocalSystem impersonation -- the sensor impersonates the local-system account of the domain controller it runs on, which has equivalent on-DC read rights without needing a separate gMSA per tenant [@mslearn-mdi-deploy-sensor-v3][@mslearn-mdi-action-accounts].

The v3.x sensor is the current path. It requires Windows Server 2019 or later with the March 2026 (or later) cumulative update installed, the Defender for Endpoint agent already deployed and onboarded, and -- critically -- there is no separate MDI installer at all. The MDI sensor capability ships as an extension of the MDE SENSE service. Self-imposed resource caps: CPU at most 30% of the host DC's CPU, memory at most 1.5 GB, with explicit Hyper-V Dynamic Memory and VMware reservation guidance that ensures the cap is honoured under contention [@mslearn-mdi-deploy-sensor-v3]. v3.x uses LocalSystem impersonation for AD reads rather than a gMSA-based DSA. The May 2026 release notes added direct v3.x support for AD FS, AD CS, and Microsoft Entra Connect identity roles when those roles run on a domain controller (which is the recommended deployment pattern for most mid-sized tenants) [@mslearn-mdi-whats-new].The 30% CPU cap is honoured by the MDE SENSE service's scheduling, but Hyper-V Dynamic Memory and VMware ballooning can break the assumption -- if the hypervisor reclaims memory under contention the sensor cannot get its 1.5 GB and the local capture buffer drops events. Microsoft's deployment guide recommends a static memory reservation on virtualised DCs for that reason.

The four target server roles are domain controllers (every DC, including RODCs), AD FS federation servers (not Web Application Proxies), AD CS online issuing certificate authorities (not offline root CAs), and Microsoft Entra Connect servers (both active and staging). The May 2026 release notes also raised the per-workspace capacity ceiling from 350 sensors to 1,000 sensors per workspace [@mslearn-mdi-whats-new].

flowchart TD DC1["Domain Controller
(WS2019+, v3.x sensor
inside MDE SENSE)"] DC2["Domain Controller
(WS2016, v2.x sensor)"] ADFS["AD FS server
(v2.x sensor, non-DC)"] ADCS["AD CS issuing CA
(v2.x or v3.x sensor)"] EC["Entra Connect server
(v2.x sensor)"] CLOUD["MDI cloud backend
(multi-tenant analytics,
per-principal baselines)"] XDR["Microsoft Defender XDR
(security.microsoft.com)
Identity tables + alerts"] DC1 --> CLOUD DC2 --> CLOUD ADFS --> CLOUD ADCS --> CLOUD EC --> CLOUD CLOUD --> XDR

The deployment matrix below is the operator-grade reference -- which role gets which sensor, which audit subcategories the sensor depends on, and what posture data the role unlocks.

Server role	Sensor version	Required audit subcategories	Posture coverage unlocked
Domain controller (WS 2019+)	v3.x (preferred)	Credential Validation; Kerberos AS; Kerberos TGS; Logon; DS Access; Computer Account Mgmt	Full Identity Security Posture (entity hygiene, dormant accounts, weak crypto)
Domain controller (WS 2016)	v2.x	Same as above	Same as above, minus v3.x-only enhancements
AD FS federation server	v2.x (or v3.x if also a DC)	AD FS audit logs (Application + Security)	Hybrid auth signal (Entra ID + on-prem)
AD CS issuing CA	v2.x (or v3.x if also a DC)	AD CS audit logs (certificate request and template events)	Nine ESC posture assessments (ESC1-Preview, ESC2, ESC3, ESC4, ESC6-Preview, ESC7, ESC8, ESC11, ESC15) [@mslearn-mdi-certificates-posture]
Entra Connect server	v2.x (or v3.x if also a DC)	Sync engine event log	Sync-engine attribute-flow signal

Note: For new DC deployments on Windows Server 2019 or later, use v3.x: no separate installer, no gMSA, no NPCap, and the sensor ships its updates with the MDE agent. For AD FS, AD CS, or Entra Connect roles that run on dedicated Windows Server 2016 hosts, v2.x is the supported path until those hosts are upgraded. Mixed environments are normal during the migration window; the cloud backend handles both versions without operator intervention [@modernsec-v3x][@jeffreyappel-v2v3]. One known limitation as of May 2026: Windows Server 2025 domain controllers that currently run a v2.x sensor cannot be migrated to v3.x; Microsoft's What's New page is explicit that "migration of domain controllers with Windows Server 2025 from sensor v2.x to sensor v3.x is not supported" and the operator should continue on v2.x on those hosts until migration support ships [@mslearn-mdi-whats-new].

Sensor topology determines coverage. Coverage determines which alerts can fire.

6.2 The alert taxonomy mapped to MITRE ATT&CK

Every offensive Active Directory primitive a reader of the SpecterOps, Mimikatz, and Certipy corpus knows has a row in MDI's alert catalogue. The catalogue is the article's bookmarkable artifact, and the table below is the load-bearing data-density object. Four MITRE-aligned categories, the named alert for each primitive, and the ATT&CK technique ID the alert maps to.

Category	MDI alert (External ID / detector)	MITRE ATT&CK technique
Reconnaissance	Account enumeration reconnaissance (LDAP) -- External ID 2437	T1087 Account Discovery
Reconnaissance	Network-mapping reconnaissance (DNS)	T1018 Remote System Discovery
Reconnaissance	Security principal reconnaissance (LDAP)	T1069 Permission Groups Discovery
Reconnaissance	User and IP address reconnaissance (SMB)	T1018
Persistence and privilege escalation	Honeytoken activity (authentication / attribute / group)	T1098 Account Manipulation
Persistence and privilege escalation	Suspected Skeleton Key attack	T1556 (Modify Authentication Process)
Persistence and privilege escalation	Suspected Golden Ticket usage (encryption downgrade)	T1558.001 Golden Ticket [@mitre-t1558-001]
Persistence and privilege escalation	Suspected Golden Ticket usage (forged authorization data)	T1558.001
Persistence and privilege escalation	Suspected DCShadow attack (DC promotion) -- External ID 2028	T1207 Rogue Domain Controller [@mitre-t1207]
Persistence and privilege escalation	Suspected DCShadow attack (DC replication request) -- External ID 2029	T1207
Persistence and privilege escalation	Suspicious additions to sensitive groups	T1098 Account Manipulation
Credential access	Suspected DCSync attack (replication of directory services) -- External ID 2006	T1003.006 DCSync [@mitre-t1003-006]
Credential access	Suspected Brute Force attack (Kerberos, NTLM)	T1110 Brute Force
Credential access	Suspected AS-REP Roasting attack	T1558.004 AS-REP Roasting [@mitre-t1558-004]
Credential access	Suspected Kerberos SPN exposure / Kerberoasting	T1558.003 Kerberoasting [@mitre-t1558-003]
Credential access	Suspected over-pass-the-hash attack	T1550.002 Pass the Hash
Lateral movement	Suspected identity theft (pass-the-hash)	T1550.002
Lateral movement	Suspected identity theft (pass-the-ticket)	T1550.003 Pass the Ticket
Lateral movement	Remote code execution attempt	T1021 Remote Services
Lateral movement	Suspected NTLM relay attack (the ESC8 class)	T1187 Forced Authentication
Lateral movement	Suspected NTLM authentication tampering	T1557.001 LLMNR / NBT-NS / Man-in-the-Middle

Both alert documentation surfaces -- the classic-format alert reference and the XDR-format alert reference -- are the canonical primaries for this catalogue [@mslearn-mdi-alerts-mdi-classic][@mslearn-mdi-alerts-xdr]. Reading either page in sequence is the single most useful afternoon a SOC operator new to MDI can spend.The numeric External IDs (2006 for DCSync, 2028 and 2029 for DCShadow, 2437 for LDAP account enumeration, and so on) are a Microsoft-internal stability anchor that survives alert-name renames over time. Microsoft has renamed alerts -- "Suspected DCSync attack" was named differently in early Azure ATP -- but the External IDs do not change. Production SOAR rules should match on the External ID, not the alert name string.

An offensive primitive in which a principal that has been granted the *Replicating Directory Changes* and *Replicating Directory Changes All* extended rights uses the DRSUAPI replication interface (specifically `IDL_DRSGetNCChanges`) to request a full or partial replication of directory contents from a domain controller -- typically targeting the `unicodePwd` attribute on sensitive accounts like `krbtgt` and `Administrator`. The technique requires no code execution on the DC, no `Ntds.dit` copy, and no presence on a domain-joined machine other than network connectivity to a DC. Mimikatz's `lsadump::dcsync` command, written by Benjamin Delpy and Vincent Le Toux, is the canonical implementation; MITRE catalogues the technique as T1003.006 [@mitre-t1003-006][@adsec-dcsync]. A specific adversary behaviour catalogued in the MITRE ATT&CK framework, identified by a stable ID (for example T1003.006 for DCSync, T1558.001 for Golden Ticket, T1207 for Rogue Domain Controller). MITRE updates the framework periodically; the IDs themselves do not change, which is why detection-engineering tooling -- including MDI's per-alert MITRE mapping -- anchors to the IDs rather than the human-readable names.

Concrete mechanism, for one named alert. Suspected DCSync attack -- replication of directory services, External ID 2006, fires on the structural pattern that an IDL_DRSGetNCChanges request reached a domain controller from a source that is not itself a domain controller. The mechanism is the one place where MDI's wire-side capture pays for itself most visibly -- the 4662 event the LSA emits records the directory-service-object access but does not identify the source as not-a-DC; only the wire view sees the calling host's IP and resolves it against the directory's serverReference set.

sequenceDiagram autonumber participant Attacker as Attacker workstation (Mimikatz) participant DC as Domain Controller participant MDI as MDI v3.x sensor (on DC) participant Cloud as MDI cloud backend participant XDR as Defender XDR portal Attacker->>DC: IDL_DRSGetNCChanges (DRSUAPI replication request) DC->>DC: LSA writes event 4662 (DS object access) DC-->>Attacker: Replication response (unicodePwd, supplementalCredentials) MDI->>MDI: Wire parse: caller IP not in serverReference set MDI->>Cloud: Stream parsed event (caller, target object, attributes) Cloud->>Cloud: Correlate against known-DC IPs, fire detector Cloud->>XDR: Write alert External ID 2006 (T1003.006) XDR->>XDR: Surface in unified incident queue

The alert taxonomy makes the bookmarkable promise the rest of the article rests on. The trigger logic that fires each row, however, depends on signal the sensor can only acquire on the wire or in the event log -- and when the trigger logic misses, the operator's last-mile coverage is KQL.

6.3 The advanced-hunting schema and a worked KQL example

When the alert template misses, the hunter writes Kusto Query Language. Defender XDR exposes three identity-specific tables that the MDI sensor populates -- IdentityLogonEvents for authentication activity captured against on-prem AD, IdentityQueryEvents for queries performed against AD objects, and IdentityDirectoryEvents for events involving an on-prem domain controller including password changes, expirations, UPN changes, scheduled tasks, and PowerShell activity [@mslearn-xdr-identitylogon][@mslearn-xdr-identityquery][@mslearn-xdr-identitydirectory]. Cross-product context is available from the unified AlertInfo, AlertEvidence, and DeviceLogonEvents tables.

The worked example below is the structural DCSync detector that catches the encrypted-channel case the alert can miss. The runner in this environment cannot execute KQL directly, so the block is annotated rather than runnable -- a non-runnable KQL detector is stronger pedagogy here than a hand-rolled Python simulation, because the query as written is exactly what an operator would paste into the Defender XDR advanced-hunting console against the actual IdentityDirectoryEvents table.

// Structural DCSync detector -- DRSUAPI from non-DC IPs
// Run against the Defender XDR advanced-hunting IdentityDirectoryEvents table.
IdentityDirectoryEvents
| where Timestamp > ago(24h)                                  // tune window per triage cadence
| where ActionType == "DRSReplicate"                          // the DRSUAPI replication call
| extend SourceIP = tostring(parse_json(AdditionalFields).SourceIPAddress)
| where SourceIP !in ("10.0.1.10", "10.0.1.11", "10.0.1.12")  // tenant DC IPs go here
| where AccountName !startswith "MSOL_"                       // Entra Connect Cloud Sync FP class
| where AccountName !in ("ADConnectSync")                     // Entra Connect on-prem FP class
| project Timestamp, AccountName, SourceIP, TargetDeviceName, AdditionalFields
| order by Timestamp desc

The output rows that survive the filters are the operator's investigation queue: DRSUAPI replication requests against a DC from a source that is not itself a DC, and not a recognised hybrid-identity sync principal. The two cleanup principals -- MSOL_* (the Microsoft Entra Connect Cloud Sync service account, with a stable MSOL_ prefix and an 8-character random suffix) and ADConnectSync (the on-prem Entra Connect service account) -- are the two most common false positives every MDI tenant sees. Adding them to the !startswith and !in clauses cuts the FP rate by an order of magnitude in most environments. The third FP class that operators tune for is legitimate vulnerability scanners triggering the LDAP / SMB reconnaissance alerts -- the scanner's authenticated enumeration looks behaviourally identical to a SharpHound collector unless the scanner's source IP is in an allowlist.

flowchart LR A["IdentityDirectoryEvents
(DRSReplicate)"] --> B["Filter: source IP
not in known_dc_ips"] B --> C["Filter: account
not in sync allowlist"] C --> D["Suspect rows
(operator triage)"]

Beyond the three identity tables there is one more surface worth naming. The April 2026 Identity Explorer Preview in the Defender XDR Identity page builds on the Microsoft Sentinel data lake -- Microsoft's 2026 cross-product cold-storage and analytics layer with up to 12 years of retention in Parquet format [@mslearn-sentinel-datalake][@mslearn-mdi-whats-new]. Identity Explorer uses the Defender XDR hunting graph to visualise identity attack paths as interactive graphs with predefined scenarios for lateral movement, privilege escalation, and credential-access risk [@mslearn-xdr-hunting-graph][@mslearn-xdr-investigate-users].

The query language is the operator's last-mile coverage layer. Everything in section 6 so far is what MDI gives you. KQL is what you do when MDI does not.

6.4 The graph layer in transition

The graph that began as ATA 1.9's Lateral Movement Paths report no longer exists in the form most operators remember. The history is a clean three-step arc and a transition still in progress.

ATA 1.9 (March 2018) shipped the Lateral movement paths to sensitive accounts report, built on SAM-R-based local-administrator discovery: the sensor remotely enumerated each member host's local-administrators group and computed the chain of "who can become whom" through cached credentials [@mslearn-ata-1-9][@atadocs-lmp-usecase]. That report carried through Azure ATP, through the Microsoft Defender for Identity rename, and through the Microsoft Defender XDR rebrand essentially unchanged for seven years.

In May 2025, Microsoft disabled the SAM-R-based discovery via Message Center notice MC1073068, citing alignment with the broader Windows NTLM-deprecation roadmap [@handsontek-mc1073068]. The message body is explicit: "Disabling this feature will impact the ability to map potential lateral movement paths (using SAM-R queries) because the data used to calculate potential lateral movement paths will no longer be collected by the Defender for Identity sensor." SAM-R as a remote-discovery primitive had become a security debt as much as a feature; the deprecation brought MDI's collection behaviour into line with Restricted SAM and Microsoft's NTLM-deprecation posture, but it left the LMP surface without its primary data source.

The replacement is in two pieces. The first is the unified attack-path exploration surface in Microsoft Defender XDR, driven primarily by Microsoft Defender for Cloud's Cloud Security Posture Management (CSPM) attack-path engine [@mslearn-defenderforcloud-attack-path], with MDI feeding identity signal into the same correlation. The second is the Identity Explorer Preview that launched in April 2026 on the Microsoft Sentinel data lake, specifically for identity attack paths -- visible from the Identity page in Defender XDR for tenants with a Sentinel data lake licence [@mslearn-mdi-whats-new][@mslearn-xdr-hunting-graph][@mslearn-xdr-investigate-users]. The honest framing in 2026 is that the post-SAM-R LMP coverage is not yet fully closed by either replacement -- the Defender XDR hunting graph is rich, the Identity Explorer is improving, but the seven-year-old SAM-R-derived LMP report had operator workflows around it that the new surfaces have not all reproduced.

MDI's graph layer is in transition. The cloud rewrite handed Microsoft the platform to ship a better graph than ATA ever could; in 2026 the build-out is still in progress. Section 9 will name this as one of the article's open problems. First, though, we have to look at the competitive market the watcher sits inside.

7. Competing Approaches -- the 2026 Identity-Detection Market

If MDI is the watcher on the DC, what is everybody else? Five named methods share the 2026 identity-threat detection market with MDI, each optimising for a different trade-off. The table below is the six-column shorthand; the prose that follows is the per-method analysis.

Vendor / project	On-DC sensor model	Data-input mix	Alert taxonomy	Graph model	Pricing model
MDI	On-DC sensor (v2.x standalone or v3.x MDE-integrated)	Wire + event log + ETW + AD CS audit	MITRE-aligned alert catalogue + nine ESC posture	Hunting graph + Identity Explorer Preview	Bundled with M365 E5 / E5 Security / F5 Security
CrowdStrike Falcon Identity Protection	Connector on/near DC + endpoint agent	Wire (via connector) + endpoint telemetry	ITDR-style alerts, less granular ATT&CK mapping	Identity attack-path view (inline enforcement)	Falcon ITDR module add-on
Semperis DSP + ADFR	Off-DC change-tracking agent	AD object-change events (LDAP / replication)	IoC and IoE runtime alerts plus drift / tamper alerts	Tier 0 exposure graph + rollback graph	Standalone licence per AD object
SpecterOps BloodHound Enterprise	Off-DC collector (SharpHound CE)	AD permissions graph + Azure / Okta / Mac extensions	Attack-path exposure findings	Pure graph (Cypher over Postgres / Neo4j)	Standalone SaaS licence
Microsoft Sentinel native UEBA	None on DC (consumes MDI + other sources)	Sentinel data lake (cross-product)	UEBA risk scores, anomaly events	None on identity graph directly	Sentinel ingestion + UEBA add-on
Sigma + SIEM (open source)	None on DC (event forwarder agents)	Windows event logs, ETW via OSQuery / Velociraptor	Custom rule library	None	Free (rule library); SIEM cost separate

CrowdStrike Falcon Identity Protection is the post-acquisition rename of Preempt Platform, the product line CrowdStrike bought when it completed the Preempt Security acquisition on September 30, 2020 [@businesswire-cs-preempt]. Architecturally distinct from MDI: rather than relying on an on-DC sensor that parses wire traffic and event logs, Falcon Identity Protection inspects authentication traffic via a connector deployed on or near each DC and correlates it with Falcon-agent telemetry already collected from every protected endpoint. Identity-policy enforcement is inline -- the product can require an MFA challenge or block an authentication at the point of decision rather than emit a post-hoc alert [@crowdstrike-falcon-id]. This is the only commercial product in the survey that does inline enforcement on AD Kerberos and NTLM authentications; it is also the only one that is not bundled with a Microsoft 365 licence.

The product category that combines runtime detection of identity-targeted attacks (Kerberos forgery, credential theft, lateral movement) with response capabilities (force MFA, disable user, revoke session). Gartner formalised the term in 2022. CrowdStrike Falcon Identity Protection and SentinelOne Singularity Identity are the largest ITDR-positioned products outside the Microsoft stack; MDI plus the Defender XDR remediation actions surface effectively functions as Microsoft's ITDR offering for tenants already inside the Microsoft 365 estate [@mslearn-mdi-remediation-actions].

Semperis Directory Services Protector (DSP) and the companion Active Directory Forest Recovery (ADFR) product are best known for change-tracking and recovery, layered over a runtime Indicators-of-Compromise and Indicators-of-Exposure detection set that overlaps with MDI's alert taxonomy on classes like DCSync, DCShadow, and Golden Ticket replay [@semperis-dsp][@semperis-adfr]. DSP tracks AD object changes in near-real-time, fires IoC and IoE alerts on the same primitives MDI watches, and offers post-attack rollback as its primary differentiator; ADFR handles malware-free forest recovery in minutes-to-hours rather than days-to-weeks. The pair is partly complementary, partly overlapping with MDI: DSP catches the post-attack drift (the unauthorised group membership change, the rogue ACL) and offers a rollback path MDI does not have; MDI's per-principal behavioural baselines and unified Defender XDR incident queue are the differentiator on the in-flight detection axis; ADFR handles "the worst day of your career" forest-recovery scenarios where rebuilding the directory is the only remediation. Many tenants run all three.

SpecterOps BloodHound Enterprise (BHE) is the commercial form of the BloodHound 2016 graph model that Andy Robbins, Rohan Vazarkar, and Will Schroeder published at DEF CON 24 [@defcon-six-degrees][@bloodhound-github-specterops][@neo4j-bh]. Pure graph attack-path exposure model: BHE maps the paths that exist (Tier Zero hygiene, principal-to-principal cross-domain trust paths, Entra to on-prem pivots) rather than alerts on attacks in flight [@specterops-bhe]. Complementary to MDI: BHE tells you the attack path exists in the directory, MDI tells you someone is walking it right now. The SpecterOps team's Certified Pre-Owned whitepaper (June 2021) by Will Schroeder and Lee Christensen is the source of the ESC1-ESC8 vocabulary that downstream MDI ADCS posture assessments map to [@specterops-cpo-pdf][@specterops-cpo-blog].

Microsoft Sentinel native UEBA is the SIEM-side behavioural-baselines product over the broader event corpus that Sentinel ingests. Sentinel UEBA uses machine learning to build dynamic behavioural profiles for users, hosts, IP addresses, applications, and other entities, with named data-source connectors including Defender for Identity [@mslearn-sentinel-ueba]. Sentinel UEBA is the "outside the identity tables" layer -- detection that needs to correlate identity signal with email, endpoint, network, and SaaS signal lives there rather than in the identity tables themselves. The Defender XDR-to-Sentinel connector unifies the surfaces [@mslearn-sentinel-defender-connector].

Open-source detection stacks -- Sigma rules deployed against Sentinel, Splunk, or Elastic, plus Velociraptor and Wazuh -- can match many of MDI's pattern-based alerts but cannot match MDI's per-principal behavioural baselines without significant in-house investment [@sigmahq-github]. The SigmaHQ rule corpus contains over 3,000 detection rules in a vendor-neutral SIEM format. Olaf Hartong's FalconForce team publishes the FalconFriday hunting-query repository (MDE-schema KQL queries for DLL injection, COM hijacking, LOLBins, LDAP anomalies, and SMB NULL sessions) -- the operator-side companion to community-built detection libraries [@github-falconfriday][@falconforce-blog].

MDI is the high-coverage, low-effort identity-threat detection product if you already have Microsoft 365 E5 or E5 Security. The third-party products in this market win on differentiation -- inline enforcement, change-tracking, exposure-graph mastery -- rather than baseline coverage. The interesting question for an architect in 2026 is not which to buy. The interesting question is what MDI, by design, cannot see at all.

8. Theoretical Limits -- the Five Structural Ceilings

There are attacks no version of MDI will ever detect. Not because Microsoft has not shipped the alert yet, and not because the engineering team has not gotten around to it. Because the alert is structurally impossible.

Five named ceilings, each anchored to a primary source. Together they are the residual blind-spot inventory every operator should be able to name from memory.

flowchart TD subgraph causes ["Attacker-side cause"] C1["OS does not expose
the credential operation"] C2["Forged ticket is
cryptographically identical"] C3["Wire traffic is wrapped
in an encrypted channel"] C4["Attack pivots through
a forest without a sensor"] C5["Attacker uses real DA
real credentials"] end subgraph gaps ["Defender-side gap"] G1["Credential Guard wall"] G2["Sapphire Ticket class"] G3["Encrypted-channel DCSync"] G4["Cross-forest tail"] G5["Legitimate principal
non-detection"] end C1 --> G1 C2 --> G2 C3 --> G3 C4 --> G4 C5 --> G5

Ceiling 1 -- the Credential Guard wall. Anything the operating system itself cannot see is invisible to MDI. The DCSync class is the canonical example with a twist: Credential Guard isolates the LSASS process so that credentials in memory cannot be scraped from a compromised endpoint, but it does not prevent DRSUAPI-level secret extraction against the DC because the DRSUAPI replication interface is supposed to return password hashes to legitimate replication partners. MDI catches DCSync by detecting the wire-side pattern (DRSUAPI from a non-DC source), not by Credential Guard's protection. Anything the OS does not expose in event log, wire traffic, or instrumented API -- a custom kernel driver that reads secrets through a side channel, a hypervisor-level credential extraction on a non-Secured-core host -- is, by construction, outside MDI's data layer.

Ceiling 2 -- forged-ticket cryptographic indistinguishability, the Sapphire Ticket. This is the most important ceiling, and the one whose permanence the rest of this section orbits.

A forged Kerberos Ticket Granting Ticket whose Privileged Attribute Certificate (PAC) is a verbatim copy of a legitimate principal's PAC, obtained via the S4U2self plus User-to-User PAC-copy flow against the target principal and then encrypted with the stolen `krbtgt` key. The technique was disclosed by Charlie Bromberg (Synacktiv / Shutdown) in October 2022 and documented on The Hacker Recipes wiki [@hackerrecipes-sapphire]. The defining property: every byte of the forged ticket's PAC matches the byte pattern of a ticket the genuine KDC would have issued for the legitimate principal, including the group SID set, the user ID, the logon time, and the authorisation-data fields. The classic Golden Ticket leaves PAC anomalies that MDI's *Suspected Golden Ticket usage (forged authorization data)* alert fires on; the Sapphire Ticket leaves no PAC anomaly because there is no anomaly to leave. The Sapphire Ticket attack obtains a target principal's PAC via the S4U2self plus User-to-User PAC-copy technique -- a Kerberos protocol flow Microsoft published as part of MS-SFU and MS-KILE -- which extracts a genuine PAC into a usable form without ever needing to authenticate as the target. The attacker then forges a new ticket whose PAC is the captured PAC, encrypted with the stolen `krbtgt` key. The mechanical sequence is: S4U2self against the target produces a ticket containing the target's PAC; the U2U flow lets the attacker decrypt the embedded PAC blob; the attacker then mints a fresh TGT around that PAC with the genuine signing key. The KDC's signature checks pass because the signing key is real, and the PAC's structural fields pass because they were lifted from a ticket the genuine KDC just issued. Only the original credential compromise that produced the `krbtgt` hash leaves a trail.

Key idea: Cryptographic indistinguishability is a permanent class. No future MDI release fixes the Sapphire Ticket without breaking Kerberos itself.

Rotate `krbtgt` twice on a defined cadence -- 90 days is common; some Tier Zero playbooks rotate every 30 days. The "twice" is non-optional: a single rotation leaves the prior `krbtgt` key valid for the duration of any tickets the KDC has previously issued, so the stolen key is still usable for up to 10 hours (or longer, on `MaxRenewAge` extensions). Combine with Authentication Policy Silos for Tier Zero service accounts, Tier Zero access reviews, and Privileged Access Workstations for any administrator who can read `krbtgt`. None of these closes the Sapphire Ticket; together they shrink the window in which a stolen key remains weaponisable. Sample PowerShell for the double rotation is in the Microsoft-published `Reset-KrbTgt` script in the GitHub samples repository [@msdefender-id-github].

Ceiling 3 -- the encrypted-channel DCSync class. When DRSUAPI is wrapped in a transport the on-DC capture cannot decode -- DCSync over LDAPS via a SPN-bound impersonation chain, for instance -- the wire-side pattern recognition that powers the External ID 2006 alert degrades. The structural detector in Section 6.3 catches the unencrypted case; the encrypted case requires either a different observation surface (the DRSUAPI handler's own instrumentation) or behavioural baselining on the post-fact replication-log signal. MDI's coverage in this case is partial, not complete.

Ceiling 4 -- the cross-forest under-instrumentation tail. MDI sees the forests its sensors are deployed in. Pivot through an external trust to a forest without MDI coverage and the signal is incomplete -- the attacker's pre-pivot reconnaissance, the actual trust traversal, and any post-pivot actions on the trusting side that do not also touch an MDI-monitored forest will be invisible. This is a deployment property, not a product property: a tenant with MDI on every forest in its environment does not have this ceiling. A tenant whose acquisition portfolio includes three forests it does not yet monitor does.

Ceiling 5 -- legitimate-principal compromise non-detection. When the attacker uses a real Domain Admin's real credentials, every action is behaviourally indistinguishable from the legitimate principal unless timing, geolocation, or device fingerprint breaks the baseline. The 2025 and 2026 Suspected session cookie theft and related XDR-format alerts close part of this gap by adding behavioural side channels that the older Azure ATP alert catalogue did not cover [@mslearn-mdi-alerts-xdr]. The residual is permanent: a sufficiently disciplined attacker operating from the legitimate principal's normal workstation, during the legitimate principal's normal hours, doing things the legitimate principal might plausibly do, is, by construction, indistinguishable from the legitimate principal.

A sixth honourable mention sits adjacent to these five: out-of-band physical access -- a stolen Ntds.dit backup, an attacker-controlled DC's offline export, supply-chain firmware compromise on the DC hardware -- is outside the data layer MDI operates over. The hardware-trust-root community owns this class of mitigation, not the identity-threat detection community.

Note: The five structural ceilings are knowable, not surprises. A SOC that names them ahead of time has a better incident-response runbook than one that does not -- specifically, the runbook for "we just realised the attacker used a Sapphire Ticket" is fundamentally different from the runbook for "MDI fired and we ignored it." The first runbook starts with krbtgt rotation and Tier Zero hygiene review; the second starts with disciplinary review and SOAR-rule tuning. Knowing which runbook to pick depends on naming the ceiling correctly.

These five named residuals are why the rest of the article exists. If MDI caught everything, the operator playbook in Section 10 would be unnecessary. Because MDI does not, and because the gaps are knowable, the playbook in Section 10 is the difference between MDI as a licence line item and MDI as a working part of the SOC's day. But before the playbook, one last open-problem inventory: where is the research roadmap actually working?

9. Open Problems -- What the Research Roadmap Is Working On

Five open problems sit between the 2026 floor and a hypothetically perfect identity-threat detector. Each one has a current best partial result and a citation. None of them is closed.

Open Problem 1 -- the post-PKINIT NTLM-relay class beyond ESC8. Synacktiv's Understanding and evading Microsoft Defender for Identity PKINIT detection paper (Guillaume Andre, 2024) reverse-engineered MDI's PKINIT-class detection: MDI fingerprints offensive-tool-generated AS-REQ messages by the encryption types they advertise, which differ from the encryption-type list a legitimate Windows API PKINIT request generates [@synacktiv-pkinit-evasion][@synacktiv-pkinit-evasion-archive]. The companion Invoke-RunAsWithCert PowerShell tool generates AS-REQ messages via the Windows API itself, producing requests structurally identical to legitimate enterprise PKINIT authentication and bypassing the fingerprint-based detection [@synacktiv-runascert-gh][@deepwiki-runascert]. Aura Security's follow-on writeup confirms the technique against the current MDI version and walks through modifying Certipy to produce matching AS-REQ shapes [@aurainfosec-mdi-pkinit]. The partial mitigation in 2026 is the additional posture-side coverage in the nine MDI Certificates assessments, which closes some of the configurations the offensive tools target [@mslearn-mdi-certificates-posture]. The runtime detection arms race continues.

Open Problem 2 -- the graph-layer transition from SAM-R LMP to Identity Explorer. Section 6.4 covered the deprecation of SAM-R-based LMP discovery in May 2025 (MC1073068) and the two replacement surfaces: the Defender XDR attack-path exploration driven by Defender for Cloud's CSPM engine, and the April 2026 Identity Explorer Preview on the Sentinel data lake [@handsontek-mc1073068][@mslearn-defenderforcloud-attack-path][@mslearn-mdi-whats-new][@mslearn-xdr-hunting-graph]. The honest open question is whether either surface reproduces, in 2026, the operator workflows the seven-year-old SAM-R-derived LMP report had built up around itself. The Defender XDR hunting graph is richer than the LMP report ever was, but its data model is different; the Identity Explorer is closer in spirit but in Preview rather than GA.

Open Problem 3 -- the Sentinel data lake correlation and the Identity Explorer GA path. Microsoft Sentinel data lake, the cross-product cold-storage and analytics layer, went public preview in 2025 and ships with up to 12 years of retention in Parquet format, a clean separation of storage and compute, and KQL plus Jupyter notebook query surfaces [@mslearn-sentinel-datalake][@mstc-sentinel-datalake-preview]. Identity Explorer is the first identity-specific surface built on top of the data lake; it is in Preview as of April 2026 with no GA date published. The open problem is whether the data-lake-tier correlation can match the alert-tier MDI quality for long-running attacker dwell -- the months between Sapphire Ticket use and discovery class -- without producing more noise than signal.

Open Problem 4 -- the MDI evasion research arms race. Synacktiv's two papers (the sensor primer by Andre and Benassouli in 2022; the PKINIT evasion paper by Andre in 2024) plus the operator notes on alert-timing exploitation that show up in adsecurity.org and SpecterOps content are the public record of the offensive-research community's targeting of MDI specifically [@synacktiv-primer-mdi][@synacktiv-primer-mdi-archive][@synacktiv-pkinit-evasion][@synacktiv-pkinit-evasion-archive]. FalconForce's reverse-engineering of the MDE sensor (via Olaf Hartong's MDE Internals series) is the methodological precedent for the same approach against MDI; the FalconForce blog and the FalconFriday hunting-query repository are the operator-facing primaries [@falconforce-mde-0x02][@falconforce-mde-0x03][@falconforce-blog][@github-falconfriday][@github-olafhartong]. The Charlie Bromberg Sapphire Ticket disclosure (October 2022) is the cryptographic-attack-class research that Section 8's third ceiling rests on [@hackerrecipes-sapphire]. The arms-race property is permanent; the defensive product team's job is to keep the detection-shipping cadence faster than the evasion-shipping cadence, which the cloud rewrite (see Section 5) made structurally possible.

Open Problem 5 -- the non-Windows directory coverage tail. MDI covers Active Directory and (via the Microsoft Entra Connect sensor) the on-prem-to-Entra-ID sync surface. Native Entra ID attacks (token theft against Entra ID itself, OAuth consent phishing, Conditional Access bypass) are covered by Defender for Cloud Apps and Entra ID Protection, not by MDI. The boundary between MDI's scope and the adjacent products is operationally meaningful: a SOC operator reading "MDI did not fire" on an Entra-ID-only attack should not conclude the attack went undetected -- another product likely did fire, in another part of the same Defender XDR portal. The unified incident queue stitches the alerts together; the operator's mental model has to know which sensor surface to look at when triaging.

The article does not claim "BloodHound CE forced MDI to add ADCS detections in 2024." The framing is parallel evolution: as BloodHound CE expanded ADCS attack-path coverage in 2024-2025, MDI extended its ADCS posture assessments and PKINIT-class runtime detections during the same window. The two product communities watch each other; neither one "forces" the other.

The roadmap is real, the build-out is in progress, and the operator decision in 2026 is not "wait for the perfect product." It is "deploy what works now, and cover the residuals with KQL."

10. The MDI Deployment and Triage Playbook

Four lanes, mapped to four operator personas: the architect who designs the sensor footprint, the SOC analyst who triages the alerts, the threat hunter who writes the KQL that fills the gaps, and everyone who needs to know what does not work.

Lane 1 -- sensor placement and prerequisite hygiene

Deploy the v3.x sensor on every domain controller running Windows Server 2019 or later, paired with the MDE agent. The deployment path is the Microsoft Defender portal's migration wizard or the standalone install via the MDE agent's onboarding flow [@mslearn-mdi-deploy-sensor-v3][@modernsec-v3x][@jeffreyappel-v2v3].

Deploy the v2.x sensor on every AD FS federation server, every AD CS online issuing certificate authority, and every Microsoft Entra Connect server (both active and staging), unless those roles already run on a domain controller covered by a v3.x sensor with the May 2026 identity-role extension enabled [@mslearn-mdi-prereq-sensor-v2][@mslearn-mdi-whats-new].

Configure the required Windows audit subcategories via the Group Policy Subcategory Settings path that the MDI event-collection page enumerates -- Audit Credential Validation, Audit Kerberos Authentication Service, Audit Kerberos Service Ticket Operations, Audit Logon, Audit Directory Service Access, Audit Computer Account Management, plus the additional subcategories for AD CS and AD FS roles. The v3.x sensor includes an Automatic Windows auditing configuration toggle that uses the Windows LSA audit-policy APIs to set the subcategories directly, eliminating the GPO step [@mslearn-mdi-event-collection].

Set the MDI Action Account in the Defender portal. The default is LocalSystem impersonation on the sensor host, which works for response actions targeting AD objects (force password reset, disable user). A gMSA-based Action Account is the alternative for tenants that want least-privilege response identities scoped per workspace [@mslearn-mdi-action-accounts][@mslearn-mdi-remediation-actions]. Avoid configuring the same gMSA across multiple sensor hosts -- the documented anti-pattern is to use one Action Account for DC-side actions only.

Verify the Microsoft Defender portal role assignments so that SOC analysts have the correct read-and-respond permissions on identity alerts. The Microsoft Defender for Identity enterprise application (ID 60ca1954-583c-4d1f-86de-39d835f3e452) is the consent surface for the response actions; tenants that have not granted consent will see "remediation action unavailable" on identity-targeted incidents [@mslearn-mdi-remediation-actions].

Lane 2 -- alert triage SLAs

The triage matrix maps alert category to response-time target and the named SOC role that owns triage. Numbers below are typical Tier 1 / Tier 2 SOC targets; tune to your environment's incident-response policy.

Alert category	Response-time target	Owning SOC role	Notes
DCSync, DCShadow, Golden Ticket	1 hour	Tier 2 (privileged-account-compromise specialist)	Treat as confirmed compromise pending evidence to the contrary
AS-REP Roasting, Kerberoasting	4 hours	Tier 2	Higher-FP class; verify offending principal pattern before escalation
NTLM relay (ESC8 class)	4 hours	Tier 2	ADCS-aware; coordinates with CA team
Reconnaissance (LDAP / SMB / DNS)	24 hours	Tier 1	Highest-FP class; allowlist legitimate scanners
Honeytoken activity	1 hour	Tier 1 plus Tier 2 escalation	Near-zero FP; any hit is investigation-worthy

Two false-positive cleanup patterns appear in nearly every tenant. The Azure AD Connect Cloud Sync service principal -- MSOL_ plus an 8-character random suffix -- legitimately performs DRSUAPI-like operations as part of the hybrid identity sync flow, and will fire DCSync-class alerts unless allowlisted. Legitimate vulnerability scanners (Tenable, Rapid7, Qualys) perform authenticated enumeration that triggers the LDAP and SMB reconnaissance alerts; scanner IPs go in an exclusion list per the Defender XDR portal's identity-alert tuning surface.

The MDI Action Accounts and Remediation Actions surface lets the responder disable a user, force a password reset, revoke an Entra ID session, or mark an account as compromised -- triggered manually from the alert flow or automatically via the Defender XDR automatic attack disruption engine, which requires 99 percent or higher detector precision before taking containment action [@mslearn-xdr-attack-disruption][@mslearn-mdi-remediation-actions][@mslearn-xdr-investigate-users]. Automatic attack disruption is opt-in per containment action; the conservative default leaves analyst confirmation in the loop for password-reset-class actions and automates disable-user only on the highest-precision detector classes.

Note: The cloud-side analytics pipeline aggregates signal across the per-principal baseline window before deciding to emit. Empirically the alert latency is minutes-cadence, not seconds-cadence. Incident response runbooks that assume sub-second alert arrival will be wrong; the operator clock starts when the alert hits the Defender XDR queue, which is itself minutes after the wire-side event. Plan for this in the SLA matrix above -- the "1 hour" target for DCSync starts from the alert timestamp, not the attack timestamp, and the attack itself may have happened five or ten minutes earlier. The Microsoft alerts-overview page is explicit that MDI is "not designed to serve as an auditing or logging solution that captures every single operation or activity on the servers where the sensor is installed; it only captures the data required for its detection and recommendation mechanisms" [@mslearn-mdi-alerts-overview].

Lane 3 -- advanced-hunting queries that fill the gaps

Three structural detectors in KQL form, each one targeting a class the named alerts can miss. Each query names the table, the columns, and the threshold tuning the operator will need.

The structural DCSync detector runs against IdentityDirectoryEvents and catches the encrypted-channel case the External ID 2006 alert may miss:

IdentityDirectoryEvents
| where Timestamp > ago(24h)
| where ActionType == "DRSReplicate"
| extend SourceIP = tostring(parse_json(AdditionalFields).SourceIPAddress)
| where SourceIP !in ("10.0.1.10", "10.0.1.11", "10.0.1.12")   // tenant DC IPs
| where AccountName !startswith "MSOL_" and AccountName !in ("ADConnectSync")
| project Timestamp, AccountName, SourceIP, TargetDeviceName, AdditionalFields
| order by Timestamp desc

Threshold tuning: keep the time window short (24 hours) for daily triage. Cleanup principals (MSOL_*, ADConnectSync, plus any per-tenant sync identities) go in the !startswith and !in clauses. The query produces a clean queue of "DRSUAPI replication from a host that should not be doing DRSUAPI." False-positive class: legitimate Azure AD Connect Cloud Sync service principals; resolve by adding the principal to the allowlist.

The slow-burn Kerberoasting detector runs against IdentityLogonEvents and catches the rate-limited Kerberoast pattern that modern attackers use to stay below the MDI behavioural-baseline threshold:

IdentityLogonEvents
| where Timestamp > ago(7d)
| where Protocol == "Kerberos"
| where ActionType == "ServiceTicketRequest"
| extend EncType = tostring(parse_json(AdditionalFields).EncryptionType)
| where EncType in ("RC4-HMAC", "DES-CBC-MD5")
| summarize SpnCount = dcount(TargetSpn), SpnList = make_set(TargetSpn) by AccountName, bin(Timestamp, 1d)
| where SpnCount > 5     // tune per tenant baseline
| order by Timestamp desc, SpnCount desc

Threshold tuning: the SpnCount > 5 threshold is the load-bearing knob. Tenants with legitimate operational accounts that request many SPNs per day (privileged service accounts running scheduled tasks across many target hosts) will need a higher threshold and an allowlist. The seven-day window catches the slow-burn pattern that a one-hour window misses.

The PKINIT-relay structural detector runs against IdentityLogonEvents and watches for AS-REQ with PA-PK-AS-REQ pre-auth coming from unexpected client subnets:

IdentityLogonEvents
| where Timestamp > ago(24h)
| where Protocol == "Kerberos"
| where ActionType == "InitialAuthentication"
| extend PreAuth = tostring(parse_json(AdditionalFields).PreAuthType)
| where PreAuth == "PA-PK-AS-REQ"
| extend ClientSubnet = strcat(split(IPAddress, ".")[0], ".", split(IPAddress, ".")[1])
| where ClientSubnet !in ("10.0.5", "10.0.6")    // legitimate smartcard subnets
| project Timestamp, AccountName, IPAddress, DeviceName, AdditionalFields
| order by Timestamp desc

Threshold tuning: PKINIT is legitimate when smartcard logon is in use. Identify the legitimate smartcard-issuing subnets and add them to the !in clause. The residual queue is PKINIT from unexpected sources -- the structural pattern behind both the post-ESC8 NTLM-relay class and the Synacktiv Invoke-RunAsWithCert evasion class.

Tenants that want the alert and event corpus in their SIEM as well as in Defender XDR should configure the MDI to Microsoft Sentinel connector through the Defender XDR-to-Sentinel integration; the connector is auto-enabled when Sentinel is onboarded to the Defender portal [@mslearn-sentinel-defender-connector].

Lane 4 -- what does NOT work

Five named operator myths, each refuted with a one-paragraph structural reason.

Myth 1: "MDI without the DC sensor still catches Kerberos attacks via the Entra ID side." Wrong. The Kerberos protocol layer is on-prem; the analytics require on-DC capture of the AS-REQ / TGS-REQ / AP-REQ exchange. Entra ID's side of the hybrid auth flow does not carry the same protocol detail. A tenant with MDI licensed but the sensor not deployed on the DCs has no Kerberos detection at all -- the licensed state is necessary but not sufficient.

Myth 2: "Disabling the v2.x sensor on AD FS is fine since it is covered by the DC sensor." Wrong. The AD FS authentication flow generates federation-side events (SAML assertions, OAuth tokens, the Application and Security event logs that AD FS itself writes) that the DC sensor does not see. AD FS deserves its own sensor unless the AD FS role is collapsed onto a domain controller, in which case the May 2026 v3.x identity-role extension covers it.

Myth 3: "Defender for Endpoint covers what MDI covers." Wrong. MDE catches endpoint behaviour -- process creation, file access, network connections, registry writes. MDI catches protocol-level Kerberos, NTLM, LDAP, and DRSUAPI patterns. The two products share an agent surface in the v3.x architecture, but the signal classes are different. An MDE-only deployment will not catch a DCSync from a workstation if MDI is not licensed and the sensor is not deployed; the MDE agent on the DC sees the local process activity but not the wire-side replication call's source.

Myth 4: "MDI alerts are real-time." Wrong. As Callout in Lane 2 above. The cloud-side batched-emission cadence is minutes-not-seconds, and incident response runbooks need to account for it.

Myth 5: "MDI requires no tuning." Wrong. Every environment has unique false-positive patterns from internal tooling that need exclusions. Microsoft ships the default detector thresholds; tenants tune them through the Defender XDR portal's identity-alert configuration surface. A tenant that has not tuned the recon-alert allowlist for its vulnerability scanners will receive far more noise than signal.

Coverage, triage, KQL, and humility about what does not work. The four lanes are the difference between MDI as a licence item on a renewal sheet and MDI as a working part of the SOC's day.

11. Frequently Asked Questions and Closing

Six questions that come up every time MDI is on a whiteboard, each in the misconception-removal pattern: wrong answer named first, then refuted.

No. See the *common misreading worth fixing* Callout in Section 4: graph-anchored attack-path evaluation in Microsoft's defensive stack originates in ATA 1.9 (March 2018), and the 2022 anchor in operator memory is the start of the SAM-R-discovery deprecation arc that culminated in MC1073068 in May 2025 [@mslearn-ata-1-9][@handsontek-mc1073068]. Not as one alert per ESC class. The MDI Certificates posture page documents **nine ADCS posture assessments** -- ESC1 (Preview), ESC2, ESC3, ESC4 (template-owner and template-ACL variants), ESC6 (Preview), ESC7, ESC8, ESC11, and ESC15 [@mslearn-mdi-certificates-posture]. The runtime detection surface for the ESC8 NTLM-relay class is the *Suspected NTLM relay attack* alert in the XDR catalogue [@mslearn-mdi-alerts-xdr]. PKINIT-class runtime detection (the post-ESC8 chain) is the AS-REQ encryption-type fingerprint that Synacktiv documented and partially evaded; the August 2023 AD CS sensor release is the prerequisite for posture coverage [@mstc-adcs-sensor][@synacktiv-pkinit-evasion][@synacktiv-pkinit-evasion-archive]. Coverage is "nine posture assessments plus one runtime alert," not "one alert per ESC1 through ESC15." Microsoft has never published the canonical list; community reverse-engineering is the only source. See the *honest provenance of the ETW provider list* Aside in Section 5 for the full provenance (Synacktiv's November 2022 primer; Olaf Hartong's FalconForce MDE Internals 0x02 methodology; the Get-TraceLoggingMetadata + Sealighter toolchain) and the snapshot-not-ground-truth framing [@synacktiv-primer-mdi][@falconforce-mde-0x02]. In the cloud since Azure ATP went GA in March 2018 [@mstc-azureatp-ga][@mstc-azureatp-intro]. The on-DC sensor is a thin parser that captures Kerberos / NTLM / LDAP / DRSUAPI on the wire, parses the protocols into structured events, and streams the parsed signal to the multi-tenant cloud backend over HTTPS. The detection logic, the per-principal behavioural baselines, and the alert-emission pipeline all run in the cloud. The legacy on-prem ATA Center model ended with Azure ATP; ATA itself shipped its last release (1.9.3) in September 2020 and Extended Support ends January 2026 [@mstc-ata-eol][@atadocs-versions]. No. The framing is parallel evolution, not a "forcing" relationship. BloodHound CE expanded ADCS attack-path coverage substantially in 2024 and 2025; during the same window MDI extended its ADCS posture assessment surface and added the AD CS sensor release in August 2023 [@mstc-adcs-sensor][@dirteam-sander-aug2023]. Both product communities watch each other -- the Defender team uses BloodHound to red-team its own environments, the SpecterOps team uses MDI when consulting in enterprise Microsoft shops -- but the causal claim "BloodHound forced MDI" is not supported by the public release record. The two communities' work has been concurrent and mutually informing. Almost. The MITRE-aligned alert catalogue in Section 6.2 covers the most-prevalent offensive primitives. Section 8 names the five structural ceilings that remain by-construction unclosable; *almost* is the load-bearing word.

Friday, 14:35. The watcher on the domain controller has written three named alerts into the Defender XDR queue. The red-team contractor's Rubeus.exe asreproast fired Suspected AS-REP Roasting attack (T1558.004). The junior auditor's bloodhound-python -c All fired Security principal reconnaissance (LDAP). The Mimikatz DCSync against the SQL host's service account fired Suspected DCSync attack -- replication of directory services, External ID 2006, T1003.006. Three alerts. Three MITRE technique IDs. Three rows in a Tier 1 analyst's queue.

The watcher's job is done. Whether the analyst opens the right one first, whether the Tier 2 escalation happens inside the one-hour SLA, whether the response action gets approved before the attacker has moved on -- none of that is MDI's problem to solve. It is yours.

Who Decided This Token Is Good? A Field Guide to Conditional Access and Entra ID Protection

noreply@paragmali.com (Parag Mali) — Tue, 26 May 2026 00:00:00 GMT

**Conditional Access is Microsoft's Zero Trust policy engine, not a feature.** Every interactive sign-in to a licensed Microsoft 365 tenant flows through three planes: a signal plane (Entra ID Protection's machine-learning risk scoring), a policy plane (Conditional Access's JSON rule evaluator), and a session plane (Continuous Access Evaluation's event-driven revocation channel). This article assembles the wire format of all three -- the `riskDetection` resource on Microsoft Graph, the `conditionalAccessPolicy` schema, the `cp1` client capability that opts a client into 28-hour tokens, and the `401 + insufficient_claims` claims challenge -- into one end-to-end picture, then names the five things this architecture fundamentally cannot do.

1. Who decided this token is good?

It is 09:02 on a Tuesday in Lisbon. Alice opens Outlook on a managed laptop in a hotel and the reading pane populates with mail in under a second. She did not type a password. She did not approve a push. She did not touch a hardware key.

Who decided that was fine?

The question is harder than it looks. Alice's password lives in a token cache from yesterday's sign-in at the office. Outlook's client silently acquires a fresh access token from Entra. That request may match a Conditional Access policy. The policy may consult an Identity Protection risk score. The result is either an access token or a refusal. Exchange Online receives the token, validates it, and may yet revoke it mid-session because something changed in the last sixty seconds. Bytes return to Alice.

Microsoft Entra ID's policy engine for evaluating sign-in attempts. A Conditional Access policy is a JSON object that matches a set of users, cloud apps, and conditions (network location, device state, sign-in risk, user risk, client app, platform) against a set of grants (block, require MFA, require compliant device, require Authentication Strength, and so on). Policies are evaluated after first-factor authentication; a block grant in any matching policy overrides all allow grants [@ms-ca-overview]. The machine-learning signal plane that scores sign-ins and users for risk. ID Protection emits `riskDetection` events tagged with `riskEventType` (anonymized IP, leaked credentials, password spray, atypical travel, and roughly two dozen others), `riskLevel` (low, medium, high), `riskState`, and `detectionTimingType` (realtime, nearRealtime, or offline). Available only on Microsoft Entra ID P2 [@ms-id-protection-overview]. The session plane. CAE is an event-driven channel between Microsoft Entra and CAE-aware resource APIs (Exchange Online, SharePoint Online, Teams, Microsoft Graph). When a critical event fires -- account disabled, password reset, high user risk, network location change -- the resource API returns `HTTP 401` with a `WWW-Authenticate: Bearer error="insufficient_claims"` challenge. The client replays the embedded claims to Entra and acquires a fresh token. In exchange for this channel, CAE tokens live up to 28 hours [@ms-cae-concept].

Every component in this chain is individually documented on Microsoft Learn. The Conditional Access policy schema is on the Graph reference [@ms-graph-capolicy]. The riskDetection resource is on the Graph reference too [@ms-graph-riskdetection]. The cp1 client capability is in the claims-challenge document [@ms-claims-challenge]. The "up to 15 minutes" propagation ceiling for CAE non-IP events is in the CAE concept document [@ms-cae-concept].

But the chain is not assembled anywhere. That is what this article does.

This article is for the architect or the detection engineer who already knows what a JWT is, what a service principal is, and what an MDM does. If you have ever stared at a Sign-in log entry that reads "Conditional Access: Success" and wondered what exactly the policy engine concluded, this is for you.

Three moments of insight are coming. First, why MFA without context fails not because MFA is weak but because the unit is wrong (Section 3). Second, why the architectural breakthrough was a separation and not a new algorithm (Section 5). Third, why the system has limits that no engineering will fix (Section 8).

How did the industry end up with a token-issuance and claims-challenge model? The answer begins in 1975, with a paper that did not mention identity once.

2. From perimeter to identity boundary

In September 1975, Jerome Saltzer and Michael Schroeder published an eight-principle paper on operating-system protection that nobody at MIT thought of as a paper about cloud identity [@saltzer-schroeder-1975]. Half a century later, two of those eight -- complete mediation and least privilege -- are the implicit theorems every Conditional Access policy evaluates against. Where did the industry go in between?

Saltzer and Schroeder: the unstated theorems

Complete mediation says "every access to every object must be checked for authority." Least privilege says "every program and every user of the system should operate using the least set of privileges necessary to complete the job." These are stated as design principles, not theorems. But they function as theorems for anyone building an access-control system: violate either of them and you have, by construction, a vulnerability. Conditional Access does not derive the principles. It re-states them as a JSON schema and a runtime evaluator.

Jericho Forum: the perimeter dissolves

In 2003, David Lacey of the Royal Mail and a loose affiliation of corporate CISOs began arguing, against the prevailing castle-and-moat consensus, that the corporate network perimeter could no longer be relied on as the trust boundary. The Jericho Forum formally launched under the Open Group umbrella in January 2004 [@wikipedia-jericho-forum]. They coined the term "de-perimeterisation" to describe what their member firms were already living: data and identity travelling outside the firewall faster than the firewall could be moved.

Microsoft's own retrospective puts the quote precisely: the Jericho Forum "promoted a new concept of security called de-perimeterisation that focused on how to protect enterprise data flowing in and out of your enterprise network boundary instead of striving to convince users and the business to keep it on the corporate network" [@simos-2020-jericho]. The first sentence of Microsoft Learn's CA overview today is a direct descendant: "modern security extends beyond an organization's network perimeter" [@ms-ca-overview].

Kindervag: the name

John Kindervag, then a principal analyst at Forrester Research, gave the model its marketable name in a September 2010 report titled "No More Chewy Centers: Introducing the Zero Trust Model of Information Security" [@kindervag-2010-zero-trust]. Three tenets: all resources are accessed securely regardless of location; access control is on strict need-to-know and strictly enforced; all traffic is inspected and logged.

The label stuck. Microsoft Learn now calls CA "Microsoft's Zero Trust policy engine" in its first sentence [@ms-ca-overview]. The lineage from Kindervag's 14-page Forrester report to that sentence is direct.

The original Kindervag PDF is gated behind Forrester's paywall. The widely cited copy on ndm.net redirects to an unrelated managed-IT-services company; the only reliably accessible mirror is the Wayback Machine snapshot. Treat the lineage as well documented and the URL as a curiosity of how academic ideas survive the open web.

BeyondCorp: the alternative

In December 2014, Rory Ward and Betsy Beyer published "BeyondCorp: A New Approach to Enterprise Security" in USENIX ;login: [@ward-beyer-2014-beyondcorp]. The paper described Google's internal Zero Trust deployment: every request authenticated and authorized by an access proxy, no implicit network trust, device inventory and user identity as the inputs to access decisions. A follow-up in 2016 documented the production rollout [@osborn-2016-beyondcorp].

This is the architectural fork Section 7 returns to. BeyondCorp puts the policy engine in the data path, as a reverse proxy that sees every HTTP request. CA puts the policy engine at token issuance and re-evaluates via claims challenges. Both work. They are not interchangeable.

NIST SP 800-207: the vocabulary

In August 2020, NIST published Special Publication 800-207, Zero Trust Architecture [@nist-sp-800-207-2020]. It codified the U.S. federal reference architecture: a Policy Engine that decides, a Policy Administrator that effects the decision, and a Policy Enforcement Point that intercepts the access.

That trio is the vocabulary the Microsoft Learn CA documentation now uses. In the SP 800-207 mapping, Conditional Access is the Policy Engine and Policy Administrator; Exchange Online, SharePoint Online, Teams, and Microsoft Graph are the Policy Enforcement Points; Entra ID Protection is the trust algorithm that feeds the Policy Engine.

If you ever have to map Conditional Access to SP 800-207 for a compliance review, the cleanest correspondences are: PE = the CA evaluator inside Entra; PA = Entra's token issuer (because the decision is effected by issuing or refusing a token); PEP = the resource API (Exchange, SharePoint, Graph) that validates the token, plus, for CAE-aware resources, the same API enforcing claims-challenge revocation mid-session. ID Protection is the "trust algorithm" input to the PE.

The doctrine was settled by 2020. But Microsoft had already been trying to build a perimeter on identity for six years, starting in 2014 with a much smaller idea.

3. Per-user MFA and the limits of binary controls

In 2014, Microsoft's only cloud-era access control was a per-user toggle that said MFA: yes or MFA: no. The toggle worked. It was a real improvement over passwords alone. It also produced the most exploited security failure of the next decade: MFA fatigue [@weinert-2023-managed-policies].

How does a control improve security and create a new attack class at the same time?

The per-user MFA state machine

Per-user MFA lives on the user object as a tri-state: Disabled, Enabled, or Enforced. Microsoft Learn now says the quiet part out loud: "The best way to protect users with Microsoft Entra MFA is to create a Conditional Access policy" and "Don't enable or enforce per-user Microsoft Entra multifactor authentication if you use Conditional Access policies" [@ms-howto-mfa-userstates]. That guidance carries a generation of operational pain inside it. Mixing the two surfaces, in practice, produces unpredictable prompts: a CA policy says "no MFA required for this location," the per-user state says "always MFA," and the user gets prompted twice.

Note: Microsoft's explicit guidance is to pick one surface. If you have Entra ID P1 or higher, use Conditional Access. The per-user state should remain Disabled for those accounts. Mixed configurations produce both false-positive prompts and, occasionally, false-negative skips [@ms-howto-mfa-userstates].

Trusted IP rules: one-dimensional context

Office 365 added a second knob in the same era: "trusted IPs." Sign-ins from a configured public IP range would skip the MFA challenge [@ms-ca-network]. The idea was that "on the corporate network" meant "more trustworthy." This was reasonable in 2014. By 2017, it was already eroded by full-tunnel VPNs (every employee egresses through the corporate /16 from home), split-tunnel VPNs (some traffic does, some does not), and the realisation that "corporate network" had stopped being a useful synonym for "trusted." Trusted IP is one-dimensional context, and one dimension was not enough.

Security Defaults: the Free-SKU descendant

Since 22 October 2019, every new Entra ID tenant has Security Defaults turned on by default at creation [@ms-security-defaults]. Security Defaults is a tenant-wide on/off switch that requires MFA for all admin roles, MFA for users when they show risk, blocks legacy authentication, and forces MFA registration. Microsoft's number on the impact is striking: "more than 99.9% of those common identity-related attacks are stopped by using multifactor authentication and blocking legacy authentication" [@ms-security-defaults].

For Entra ID Free tenants in 2026, Security Defaults is still the only available baseline. There is no per-app policy, no per-risk gating, no Conditional Access. This is the licensing reality Section 10 returns to.

Active Directory Federation Services -- AD FS -- is the on-prem federation product that ran the access-control story before any of this. It is still operational in many tenants. It is no longer Microsoft's strategic identity provider; the Microsoft Learn AD FS overview now opens with the explicit guidance "Instead of upgrading to the latest version of AD FS, Microsoft highly recommends migrating to Microsoft Entra ID" [@ms-ad-fs-overview]. AD FS claim rules functioned as a kind of policy engine, but they evaluated only at federation time and they had no concept of risk.

The four failure modes of the binary toggle

The first-generation controls -- per-user MFA, trusted IPs, Security Defaults -- share four documented limits:

No expression of context. The toggle is either on or off. It cannot say "MFA from a new country but not from the office."
Trusted IP is thin context. A public IP range is one bit of information; modern attacks include matching network egress.
No per-app policy. The toggle applies to all apps the user accesses. You cannot say "MFA for the admin portal, not for Outlook."
No exclusion semantics for break-glass accounts. Emergency-access accounts need to be reachable when everything else has failed. The binary toggle either includes them or excludes them; it does not let you say "exclude these accounts but log every sign-in as a high-priority alert."

MFA fatigue: when a control becomes a credential

The canonical failure of the binary toggle is push-bombing. The attacker has the password. The system requires MFA. The user gets four "approve sign-in?" notifications during a morning meeting. One gets a thumbs-up by reflex. The system did exactly what it was configured to do.

The attack works because the control has no concept of whether this is a normal sign-in. The same flow runs whether the request originates from the user's office WiFi or an anonymizing proxy in another country. The MFA challenge carries no risk-weighted information; the user has no signal that this prompt is different from yesterday's prompt. Fatigue is the consequence. Microsoft's own Entra blog catalogued the attack pattern and the operational mitigations in the wake of the 2022 incident cluster [@ms-techcom-mfa-fatigue].

Focusing on password rules, rather than things that can really help -- like multi-factor authentication (MFA), or great threat detection -- is just a distraction. -- Alex Weinert, Microsoft Identity, July 2019 [@weinert-2019-password]

Weinert's 2019 piece is now infamous in the identity community for its title alone -- "Your Pa$$word doesn't matter." The argument was that a password's composition rules carry no information that helps the system tell a real user from an attacker; what does carry information is context. The system needed a place to put that context.

If MFA yes/no cannot express context, the next step is obvious: make context the input. But to make context the input, the system needs a place to put it. The history of CA from 2015 forward is the history of giving context a home.

4. Generation by generation

The next eight years produced six generations of access control, each one closing a specific failure of the previous one. They look like product launches in a marketing chronology. They are something more interesting: a sequence of negative results, each followed by a positive engineering response.

timeline title Conditional Access timeline 2014 : Gen 1 per-user MFA and trusted IPs 2015 : CA enters public preview 2016 : Gen 2 Conditional Access general availability 2016 : ID Protection enters preview 2018 : Gen 3 risk-based CA conditions broadly available 2020 : CAE enters preview 2022 : Gen 4 Continuous Access Evaluation general availability 2023 : Gen 5 CA for workload identities 2023 : Gen 6 Microsoft-managed policies and Authentication Strengths 2026 : CA for AI agent identities

The 2026 milestone -- Conditional Access for AI agent identities -- is itself still emerging; Microsoft's current framing in the Conditional Access Optimization Agent announcement names it explicitly as a frontier rather than a finished generation [@ms-techcom-ca-optimization-agent]. Section 9.1 returns to the open problems.

Gen 1 (2014 to 2016): per-user MFA

Documented in Section 3. The control has no concept of context. The failure motivates Gen 2.

Gen 2 (September 2016 GA): Conditional Access with static rules

The September 27, 2016 CloudBlogs post announcing CA general availability framed it as "Protect your data at the front door" -- the "front door" framing that Microsoft documentation still uses [@ms-techcom-ca-frontdoor-2016]. The policy schema (users + cloud apps + conditions to grants) was introduced in the 2015 preview [@ms-techcom-ca-preview-2015] and survived essentially unchanged into 2016 GA.

Gen 2 closed Gen 1's failure mode: context now had a home. A policy could match on network location, on the app being accessed, on the user's group membership, on the device platform. It could express "block country X" or "require MFA when not on the corporate network."

The remaining documented limit: no risk feed. The engine could express what to check for but not whether this specific sign-in looks suspicious. A policy could block credential-stuffing attempts only if you happened to know in advance which IPs to deny. Motivated Gen 3.

Gen 3 (2017 to 2018): risk-based fusion

Identity Protection had been generating risk signals since its March 2016 preview. Through 2017 and 2018, two new condition keys appeared in the CA policy schema: signInRiskLevels and userRiskLevels. Both take values from the set low, medium, high. The risk feed plugged into the policy plane through exactly two keys. The legacy ID-Protection-side risk policies (which were a parallel policy surface inside ID Protection itself) are now retiring on 1 October 2026; the canonical surface is CA [@ms-id-protection-policies].

The remaining limit: pre-issuance only. The CA evaluator runs at sign-in time. Once a token is issued, the policy plane has no way to undo the decision until the token expires. Microsoft's own retrospective is honest about what they tried first: "Microsoft experimented with the 'blunt object' approach of reduced token lifetimes but found they degrade user experiences and reliability without eliminating risks" [@ms-cae-concept]. A one-hour token cuts the worst-case revocation latency to an hour, but it also means a user with intermittent connectivity gets prompted every hour, and a mobile app with retry storms can hammer the IdP. The trade-off was unacceptable. Motivated Gen 4.

Gen 4 (January 2022 GA): Continuous Access Evaluation

CAE inverted the trade-off. Instead of shortening the token, lengthen it -- up to 28 hours [@ms-cae-concept]. Then add a side channel: when a critical event fires (account disabled, password reset, high user risk, IP location change), the resource API issues an HTTP 401 with a WWW-Authenticate claims challenge, and the client replays to Entra for a fresh token. Latency on the side channel is bounded: "up to 15 minutes" for non-IP events, "instant" for IP locations [@ms-cae-concept]. CAE was tied to an emerging open standard from day one, the OpenID Continuous Access Evaluation Profile [@ms-cae-concept]. The general-availability announcement landed on 10 January 2022 [@ms-techcom-cae-ga-2022].

Remaining limit: applies to humans only. Service principals do not consume CAE-aware client libraries; they cannot perform a claims challenge. Motivated Gen 5.

Gen 5 (2023 GA): Conditional Access for workload identities

Same engine, constrained grant set. The Microsoft Learn page is blunt on the boundaries: "Workload Identities Premium licenses are required" and the constraint set is unusual -- "Policy can be applied to single tenant service principals that are registered in your tenant. Microsoft and third-party SaaS applications, including multitenant apps, are not covered by these policies. Managed identities aren't covered by policy" and "Under Grant, Block access is the only available option" [@ms-workload-identity-ca]. The public preview of CA filters for workload identities opened on 26 October 2022 [@vansurksum-2022-workload-ca]; the Microsoft Entra Workload Identities standalone product followed in late November 2022, and the Conditional Access feature for workload identities itself reached general availability later in 2023.

The single-tenant restriction is a structural choice. Multi-tenant SaaS apps appear in many tenants' service principal directories at once; policy scoping on them would require a cross-tenant resolution protocol the engine does not have. Managed identities are excluded because they belong to Azure subscriptions, not to user identity, and Microsoft has chosen not to extend the surface there. Group assignments do not work either: "Conditional Access policies assigned to a group that contains a service principal are not enforced for that service principal" [@ms-workload-identity-ca].

Remaining limit: under-configured in most tenants because the grant taxonomy is so narrow that admins do not see immediate value. Motivated Gen 6.

Gen 6 (November 2023 onwards): Microsoft-managed policies and Authentication Strengths

In November 2023, Alex Weinert announced Microsoft-managed Conditional Access policies: a set of baselines that Microsoft would auto-deploy into tenants in Report-only mode and then auto-enable after a waiting period [@weinert-2023-managed-policies]. The launch announcement specified a 90-day window [@helpnet-2023-microsoft-entra-policies]. The current Microsoft Learn documentation specifies "Microsoft enables these policies no less than 45 days after they're introduced in your tenant if they're left in the Report-only state" with a 28-day pre-enablement notification [@ms-managed-policies].

The window shrank deliberately. The 90-day window in the 2023 launch announcement was a calibration window; the 45-day window in current documentation is the post-calibration setting. Both numbers are correct in their respective time frames. The article uses the current number throughout.

Parallel to the managed policies, Microsoft shipped Authentication Strengths -- a named bundle of acceptable authentication methods that can be required as a grant. The three built-in strengths are MFA strength, Passwordless MFA strength, and Phishing-resistant MFA strength (FIDO2 security key, Windows Hello for Business, multifactor certificate-based authentication) [@ms-auth-strengths]. The phishing-resistant strength is the modern way to express "no adversary-in-the-middle phishing kit should be able to defeat this grant."

The pattern: extension, not replacement

From Gen 3 onward, each generation extends the prior schema rather than replacing it. The conditionalAccessPolicy JSON shape that shipped in 2016 still drives the engine in 2026 -- with new condition keys added, new grant types added, new session controls added. By the standards of cloud control surfaces, that is a long run without a rewrite.

The reason is the architectural decision the next section is about.

5. The two-plane separation

The breakthrough is not a model, not a token format, not a wire protocol. It is a separation: the signal plane that produces risk detections from the policy plane that consumes them.

Stated like that, it sounds banal. Read it the other direction -- a policy engine whose risk model can change without changing the policy semantics, and whose policy can change without retraining the model -- and it is the design that makes the system maintainable at trillions of daily signals across hundreds of thousands of tenants.

The two planes, precisely

The signal plane is Microsoft Entra ID Protection. It runs detection logic on every interactive sign-in (and, for offline detections, on historical sign-ins) and emits a riskDetection resource into a per-tenant log on Microsoft Graph at /identityProtection/riskDetections. Each detection carries five fields you care about: riskEventType (one of about two dozen named detection types like anonymizedIPAddress, leakedCredentials, unlikelyTravel), riskLevel (low, medium, high, plus the bookkeeping values hidden and none), riskState (atRisk, confirmedCompromised, dismissed, remediated), detectionTimingType (realtime, nearRealtime, offline), and additionalInfo (a JSON blob with user-agent, IP, alert URL, reason codes) [@ms-graph-riskdetection][@ms-id-protection-risks].

The policy plane is Conditional Access. It is a JSON object at /identity/conditionalAccess/policies/{id} on the Graph API [@ms-graph-capolicy]. Each policy has displayName, state (enabled, disabled, enabledForReportingButNotEnforced), conditions, grantControls, and sessionControls. The conditions block contains the per-policy targeting: which users, which apps, which platforms, which network locations -- and two condition keys named signInRiskLevels and userRiskLevels.

**Sign-in risk** is a per-sign-in probability that the credential being used is being used by someone other than the legitimate owner *at this moment*. **User risk** is a per-user probability that the account itself has been compromised over its recent history. A user with leaked credentials in a breach corpus carries persistent user risk until the password is reset; a user signing in from an anonymizing proxy carries sign-in risk for that session. CA policies can match on either, both, or neither. Risk-based conditions require Entra ID P2 [@ms-id-protection-policies].

Those two condition keys -- signInRiskLevels and userRiskLevels -- are the entire API surface between the signal plane and the policy plane. Everything else about ID Protection is hidden behind them. The policy plane does not know whether high came from a transformer or a logistic regression or a hardcoded rule. The signal plane does not know which policies will read its output. The contract is two strings.

flowchart LR subgraph SP[Signal plane Entra ID Protection] DET[Detection pipeline] RD[(riskDetection log)] RL[Risk level low medium high] end subgraph PP[Policy plane Conditional Access] EV[Policy evaluator] POL[(conditionalAccessPolicy JSON)] TOK[Token issuer] end subgraph SES[Session plane CAE] CH[Critical event channel] RP[Resource API] end DET --> RD DET --> RL RL -. signInRiskLevels userRiskLevels .-> EV POL --> EV EV --> TOK TOK -- access token --> RP DET -. user risk events .-> CH CH -. 401 insufficient claims .-> RP

Why the separation matters

Three concrete consequences fall out of the design:

The risk model is re-trainable without policy rewrites. Microsoft's ID Protection team can change the underlying detection algorithm tomorrow. Add a new riskEventType. Replace the classifier for unlikelyTravel. Re-tune the threshold that maps a score to low/medium/high. None of these require tenants to rewrite their CA policies, because policies match on the level, not the signal.

Tenants without the licence simply do not use the risk conditions. An Entra ID P1 tenant can deploy CA policies that match on users, apps, locations, devices, client apps, and platforms. P2 unlocks the risk conditions. The schema accommodates both: P1 policies just leave the risk arrays empty. There is no parallel policy surface for the non-risk-aware tenants; they use the same engine.

CAE is a third plane layered onto the same skeleton. Continuous Access Evaluation did not require redesign of the policy plane. The CAE channel is a new event delivery mechanism; the events it propagates are things the signal plane already knew about (high user risk, password reset, account disabled) plus new ones the policy plane introduced (network-location-policy changed). The architecture absorbed CAE because the design was already a separation of concerns.

Key idea: The signal plane and the policy plane are separable; the contract between them is two condition keys (signInRiskLevels and userRiskLevels). That is what makes the system maintainable across a decade of evolution.

The "pit of success" framing

Alex Weinert calls this the "pit of success." His November 2023 piece on Microsoft-managed policies put the metric on it: a decade ago Microsoft turned on a "radical" tenant-wide policy requiring MFA for every consumer Microsoft account, and "today, 100 percent of consumer Microsoft accounts older than 60 days have multifactor authentication" [@weinert-2023-managed-policies].

The 100 percent number is achievable because the policy plane and the signal plane can each evolve independently. Microsoft can ship a managed policy that says "require MFA for high-risk sign-ins" without committing to a fixed definition of "high risk." The definition lives on the signal plane and changes weekly. The policy lives on the policy plane and is stable for years.

With the separation as the spine, the next section walks the end-to-end pipeline in one continuous trace, from signal to grant to token to session, on a real sign-in -- the trace no public Microsoft document assembles in one place.

6. The end-to-end pipeline

Take Alice's Tuesday morning from Section 1 and walk it forward. This section has six subsections. By the end of them, the question "who decided?" has six independently sourced answers and one combined picture.

6.1 What the signal plane sees

Identity Protection's detection taxonomy splits into five rough groups, based on what kind of information triggered the detection. The canonical taxonomy is the Microsoft Learn page on risk types [@ms-id-protection-risks]; the wire-format enum on the Graph schema is at [@ms-graph-riskdetection].

Network signals. anonymizedIPAddress, maliciousIPAddress, nationStateIP, riskyIPAddress. The signal is the source IP and reputation databases that ID Protection ingests.
Behavioural signals. unlikelyTravel, mcasImpossibleTravel, newCountry, unfamiliarFeatures, anomalousUserActivity. The signal is a deviation from the tenant's or the user's historical baseline.
Credential signals. leakedCredentials, passwordSpray. The signal is a match against a corpus of breached credentials or a velocity-based pattern across tenants.
Token and session signals. anomalousToken, tokenIssuerAnomaly, attemptedPrtAccess, attackerinTheMiddle, authenticatorPhishing. The signal is on the token itself or on the way the authenticator flow ran.
Inbox behaviour. suspiciousInboxForwarding, mcasSuspiciousInboxManipulationRules. The signal is on what happened after the sign-in -- a post-compromise indicator that retroactively flags the sign-in that enabled it.

Each detection is also tagged with a timing: real-time, near-real-time, or offline. Microsoft Learn is precise about the latencies: "Detections triggered in real-time take 5-10 minutes to surface details in the reports. Offline detections take up to 48 hours" [@ms-risk-detection-types].

The detection is mapped to a risk level, not a probability. Microsoft Learn calls the level "calculated by our machine learning algorithms" and explicitly notes the meaning: low/medium/high "represent how confident Microsoft is that one or more of the user's credentials are known by an unauthorized entity" [@ms-risk-detection-types]."Confidence" here is meant in the everyday sense, not the strict statistical sense of a confidence interval. Microsoft has not published a calibration study that would let you map a "high" risk level to a frequentist probability of compromise.

The figure you sometimes see in Microsoft marketing materials -- "more than 100 trillion signals processed per day" [@ms-managed-policies], or, in older sources, "78 trillion" [@ms-id-protection-overview] -- is the aggregate signal volume across all tenants and product surfaces, not per-sign-in features per user. The article keeps the two carefully separate.

Microsoft has not publicly disclosed the production model architecture, the feature vector size, or per-detection precision and recall. The 2021 Microsoft Security Blog interview with Maria Puertas Calvo describes the existence of the ML team and the operational scale ("hundreds of terabytes every day") but stops well short of architecture details [@ms-puertas-calvo-interview]. The model class is publicly unspecified; the taxonomy and the operating output are both public.

6.2 How risk surfaces

Two parallel logs matter for risk. The Sign-in log is the universe: every interactive and non-interactive sign-in produces an entry. The riskDetections log is the sparse overlay: a riskDetection is emitted only when a detection fires for the sign-in. Most sign-ins produce a Sign-in log entry with no corresponding riskDetection. Only flagged sign-ins do [@ms-graph-riskdetection].

This is a common source of confusion. It is tempting to assume "ID Protection scored every sign-in," and in a sense it did -- the detectors ran -- but the durable artefact exists only when at least one detector fired. To compute a per-sign-in distribution of risk you need to join the Sign-in log with the riskDetections log and treat the unjoined rows as "no risk flagged at the moment of issuance."

There is one more wrinkle. The detection taxonomy on the Microsoft Learn concept page and the riskEventType enum on the Graph schema are not perfectly aligned. The concept page lists mcasImpossibleTravel and authenticatorPhishing as named detection types; the Graph enum lists impossibleTravel (without the mcas prefix). The two surfaces sometimes use different value names for the same logical detection -- a UI display string versus a Graph enum value. Detection engineers writing KQL against the Sign-in logs should account for both.

6.3 How CA consumes risk

Conditional Access evaluation runs in a fixed order: assignments are checked first (does this sign-in match this policy at all?), then conditions (do all the condition predicates hold?), then grants (which controls are demanded?), then session controls (which token lifetime, sign-in frequency, persistent browser).

The key semantic, repeated across the Microsoft Learn documentation: a block grant in any policy matching the sign-in overrides any allow grant in any other policy. The policy plane is not just additive; it has an explicit precedence rule.

flowchart TD A[Sign-in request] --> B[First-factor auth] B --> C[Enumerate matching policies] C --> D{Any policy matches?} D -- No --> E[Default allow with token] D -- Yes --> F[Evaluate conditions per policy] F --> G{Block grant in any match?} G -- Yes --> H[Deny access return error] G -- No --> I[Aggregate required grants] I --> J{All grants satisfied?} J -- No --> K[Issue challenge MFA or device] J -- Yes --> L[Apply session controls] L --> M[Issue access token]

The pseudocode below is a compressed restatement of that flow. It is not Microsoft source code; it is the algorithmic shape an admin should keep in their head when reading a policy or debugging a sign-in.

{` function evaluate(signin) { const matching = allPolicies.filter(p => p.state !== 'disabled' && matchesAssignments(p.conditions, signin) && matchesConditions(p.conditions, signin) );

// Block precedence: any block grant wins if (matching.some(p => p.grantControls.builtInControls.includes('block'))) { return { decision: 'DENY', reason: 'block grant matched' }; }

// Aggregate required grants across matching policies const requiredGrants = new Set(); for (const p of matching) { for (const g of p.grantControls.builtInControls) requiredGrants.add(g); if (p.grantControls.authenticationStrength) { requiredGrants.add('authStrength:' + p.grantControls.authenticationStrength.id); } }

const satisfied = [...requiredGrants].every(g => signin.satisfies(g)); if (!satisfied) { return { decision: 'CHALLENGE', missing: [...requiredGrants].filter(g => !signin.satisfies(g)) }; }

// Apply session controls (token lifetime, sign-in frequency, persistent browser) const session = mergeSessionControls(matching.map(p => p.sessionControls)); return { decision: 'ALLOW', session }; }

const result = evaluate({ user: 'alice@contoso.com', app: 'Office365 Exchange Online', location: { ip: '203.0.113.42', country: 'PT' }, device: { compliant: true, joinType: 'Entra' }, signInRisk: 'low', userRisk: 'none', satisfies(grant) { const mfa = ['mfa', 'authStrength:phishingResistantMfa']; return mfa.includes(grant) || grant === 'compliantDevice'; }, }); console.log(JSON.stringify(result, null, 2)); `}

Risk-based conditions require Entra ID P2 [@ms-id-protection-overview]. Without that licence, the signInRiskLevels and userRiskLevels arrays in a policy are ignored. The rest of the engine works the same.

6.4 The grants

Each policy declares a set of grants. The grants are additive within a policy (all required to satisfy the policy) but the block grant in any matching policy takes precedence over allow grants in any other policy. Here are the grants currently in the schema:

Grant	What it requires	Notes
`block`	Deny access.	Always wins against allow grants.
`mfa`	Any MFA method registered for the user.	The legacy generic-MFA grant; replaced in modern deployments by Authentication Strength.
`requireAuthenticationStrength`	A named bundle of acceptable methods.	The modern grant. Built-in strengths include phishing-resistant [@ms-auth-strengths].
`compliantDevice`	The device record has `isCompliant: true`.	Set by Intune or a third-party compliance partner.
`domainJoinedDevice`	Hybrid Azure AD joined device.	Requires Entra Connect on-prem trust.
`approvedApplication`	Use an approved client app.	A small allow-list of Microsoft mobile apps.
`compliantApplication`	An app under an Intune App Protection Policy.	Mobile app management.
`passwordChange`	User must change their password.	Used for password-leaked recovery.
`requireTermsOfUse`	User must accept a terms-of-use document.	Used for compliance and guest scenarios.

A named, ordered bundle of acceptable authentication methods that a CA grant can demand. The three built-in strengths are *MFA strength* (any registered second factor), *Passwordless MFA strength* (no password used), and *Phishing-resistant MFA strength* (FIDO2 security key, Windows Hello for Business or a platform credential, or multifactor certificate-based authentication) [@ms-auth-strengths]. The phishing-resistant strength is the canonical modern grant for high-value access.

The Authentication Strength grant is where the phishing-resistance story lives in 2026. A policy that demands the phishing-resistant strength refuses to accept TOTP or SMS or push as the second factor. Only credentials with cryptographic binding to the device or hardware token will satisfy the grant. That class of credential, by construction, cannot be replayed by an adversary-in-the-middle phishing kit -- because the underlying WebAuthn ceremony is bound to the origin of the relying party.

6.5 The Windows-side handoff

PRT issuance is an interactive sign-in. It goes through CA like any other.

A long-lived refresh token issued to a Windows session at user sign-in to Entra-joined or hybrid-Entra-joined devices. The PRT is bound to the device's TPM where one is available, and it grants the user single sign-on to all CA-targeted apps from that Windows session. Issuance is subject to CA evaluation; if a CA policy demands compliant device, the device must already be marked `isCompliant` before the PRT is issued.

The compliance state lands on the device object as isCompliant. Intune (or a third-party MDM through Intune's compliance-partner API) writes that field after evaluating the device against a compliance policy: disk encrypted, OS patched, antivirus running, jailbreak detection clean, and so on. CA reads it on subsequent policy evaluations. If a policy requires compliantDevice and the device object says isCompliant: false, the grant is not satisfied.

The operational seam to on-prem Active Directory runs the other direction. Kerberos and NTLM against on-prem domain controllers never consult Entra. The Microsoft Learn CA overview is explicit: CA is a cloud control plane; on-prem authentication is outside its scope [@ms-ca-overview]. This is the limit Section 8 will name precisely.

6.6 CAE in session

The third plane. Wire format lives in two Microsoft Learn pages: the claims-challenge page [@ms-claims-challenge] and the app-resilience CAE page [@ms-app-resilience-cae].

A client opts in to CAE by advertising the cp1 capability via the xms_cc claim in token requests. In MSAL, that opt-in looks like WithClientCapabilities(new[] { "cp1" }) [@ms-app-resilience-cae]. The Microsoft Learn claims-challenge page says it cleanly: "The only currently known value is cp1" [@ms-claims-challenge].

When the policy plane sees a critical event after the token was issued, the resource API responds to the next call with HTTP 401 Unauthorized and a WWW-Authenticate header of the shape:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer authorization_uri="<entra-authorize-endpoint>", error="insufficient_claims", claims="<base64-encoded JSON>"

The claims value is a base64-encoded JSON object that the client passes verbatim to the token endpoint when acquiring a fresh token [@ms-claims-challenge][@ms-app-resilience-cae]. The IdP evaluates the embedded claims, runs CA again with the new context, and issues a new token (or refuses).

The HTTP wire format CAE uses to revoke a session mid-flight. A CAE-aware resource API returns `HTTP 401` with `WWW-Authenticate: Bearer error="insufficient_claims", claims=""`. The client replays the base64 blob to Entra; Entra re-runs CA with the new context; the client receives a fresh token or a definitive refusal. The wire format is documented at [@ms-claims-challenge] and demonstrated at [@ms-app-resilience-cae].

Note: The CAE-aware capability is signalled by the client, not by the token. The client advertises cp1 via xms_cc; the token's CAE-awareness shows up as its lifetime (up to 28 hours) and the resource API's willingness to issue a claims challenge. Folk knowledge that says "look for a cae claim in the JWT" is incorrect.

The Microsoft Learn CAE document enumerates five critical events: account disabled or deleted, password change or reset, MFA enabled by an administrator, administrator token revocation, and high user risk detected by ID Protection [@ms-cae-concept]. A parallel pathway, Conditional Access policy evaluation, propagates network-location and policy changes to CAE-aware resource providers on the same channel. For IP-location changes the latency is "instant"; for everything else the ceiling is up to 15 minutes [@ms-cae-concept].

sequenceDiagram participant C as Client app participant R as Resource API CAE aware participant E as Entra token issuer participant P as ID Protection Note over C: Client holds long-lived CAE token C->>R: GET messages with bearer token R->>R: Token still cryptographically valid P->>E: High user risk event for Alice E->>R: Push critical event Alice high risk C->>R: GET messages with bearer token again R->>C: 401 WWW-Authenticate insufficient_claims claims base64 C->>E: Token request with claims blob and cp1 capability E->>E: Re-run CA with new context E-->>C: New token or definitive refusal C->>R: Retry with new token

{` // Simplified MSAL.js-shaped pseudocode for CAE opt-in and challenge handling const ENTRA_AUTHORITY = ''; const EXCHANGE_ENDPOINT = ''; const MAIL_READ_SCOPE = '';

const msal = new PublicClientApplication({ auth: { clientId: '', authority: ENTRA_AUTHORITY }, });

async function callExchange() { let token = await msal.acquireTokenSilent({ scopes: [MAIL_READ_SCOPE], clientCapabilities: ['cp1'], // advertise CAE awareness });

let res = await fetch(EXCHANGE_ENDPOINT, { headers: { Authorization: 'Bearer ' + token.accessToken }, });

if (res.status === 401) { const header = res.headers.get('WWW-Authenticate') || ''; const m = /claims="([^"]+)"/.exec(header); if (m) { // Replay the embedded claims to acquire a fresh token token = await msal.acquireTokenSilent({ scopes: [MAIL_READ_SCOPE], claims: Buffer.from(m[1], 'base64').toString('utf8'), clientCapabilities: ['cp1'], }); res = await fetch(EXCHANGE_ENDPOINT, { headers: { Authorization: 'Bearer ' + token.accessToken }, }); } }

console.log('HTTP', res.status); }

callExchange(); `}

Key idea: CAE inverts the conventional trade-off: lengthen the token, shorten the revocation. The token can live 28 hours because revocation is an event, not a clock.

The chain is now visible. The signal plane scored Alice's Tuesday sign-in. The policy plane evaluated the policies. The token issuer issued an access token (CAE-aware because Outlook advertises cp1). Exchange Online accepted the token and returned mail. If, twelve minutes from now, Alice's account is flagged high risk because a different sign-in attempt fires leakedCredentials, the critical event will fire, Exchange will issue a claims challenge, and Outlook will either acquire a fresh token (passing the new CA evaluation) or surface the refusal to the user.

Six independent components co-decided on one access event. Microsoft is one vendor. The same problem has been solved differently by Google, Okta, AWS, Cloudflare, and Zscaler. The Microsoft answer is not the only correct answer.

7. How others do it

Microsoft chose to enforce at token issuance and claims challenge. Google chose to enforce at every HTTP request via a reverse proxy. AWS chose a decidable policy DSL. These are not minor variations; they are different answers to "where does the policy engine live in the data path?"

Both Microsoft's and Google's models scale. Neither is strictly better. The choice is a function of what the enterprise already runs.

Google BeyondCorp, IAP, Chrome Enterprise Premium

Google's Identity-Aware Proxy puts the policy engine in the data path. The documentation calls it bluntly: "IAP lets you establish a central authorization layer for applications accessed by HTTPS, so you can use an application-level access control model instead of relying on network-level firewalls" [@google-iap]. Every HTTP request to an IAP-protected app passes through the proxy. The proxy authenticates the user (via Google Account, Workforce Identity Federation, or Identity Platform), evaluates a Common Expression Language policy against the request context, and -- on allow -- forwards the request to the backend with signed identity headers.

The BeyondCorp Enterprise product (recently rebranded as Chrome Enterprise Premium) layers context-aware access on top: device posture, geographic location, time of day [@google-bce-overview]. The architecture matches the 2014 USENIX paper [@ward-beyer-2014-beyondcorp] and the 2016 production follow-up [@osborn-2016-beyondcorp].

The strength is per-request authorization: every HTTP call is its own decision point. The weakness, from the M365 perspective, is that IAP does not gate Microsoft 365 first-party API traffic. The Outlook client does not route through Google's IAP; it routes through Entra and Exchange Online. For Microsoft 365 workloads, IAP is complementary at best.

Okta Identity Engine and ThreatInsight

Okta's policy engine is closer to Microsoft's structurally: the identity provider is the policy engine, app sign-on policies live on the IdP, and the resource side relies on the IdP's token rather than a per-request proxy. The Okta Identity Engine documents the rule shape: "App sign-in policies define how a user must authenticate to gain access to an app. They verify ... group membership, the IP zone they're signing in from, risk level, and others" [@okta-sign-on-policies]. Every new app gets a default policy with a single catch-all rule that allows access with two factors.

Okta ThreatInsight is the IP-reputation feed. The documentation describes it operationally: "Okta ThreatInsight aggregates data about sign-in activity across the Okta customer base to analyze and detect potentially malicious IP addresses ... password spraying, credential stuffing, brute-force cryptographic attacks" [@okta-threatinsight]. The signal coverage is narrower than ID Protection: ThreatInsight is IP-centric, where ID Protection runs a multi-detection ML pipeline on tokens, sessions, behaviour, and credentials.

AWS IAM Identity Center and Verified Access

AWS splits the problem. IAM Identity Center handles workforce SSO and trusted identity propagation to AWS services [@aws-iam-identity-center]. AWS Verified Access handles per-request authorization for HTTPS-fronted apps -- the ZTNA piece. The Verified Access docs put it plainly: "Verified Access evaluates each application access request in real time" and "verifies the trustworthiness of users and devices against a set of security requirements" [@aws-verified-access].

The interesting bit is the policy language: Cedar. Cedar is a deliberately decidable language for authorization policy. "Decidable" here is a precise term: the safety question (will some policy edit, in some future edit chain, leak this right?) is answerable by a static analyser for any Cedar policy [@cedar-security].

Cedar's intentional non-Turing-completeness is the language-design hedge against the Harrison-Ruzzo-Ullman undecidability result the next section will name. The trade-off is expressiveness: Cedar cannot express arbitrary computational predicates, which is the price of being analysable [@cedar-security].

Cloudflare Access and Zscaler Private Access

Cloudflare Access is an edge proxy. Policies are deny-by-default, with four building blocks: Actions (Allow, Block, Bypass, Service Auth), Rule types (Include, Require, Exclude), Selectors, and Values [@cloudflare-access-policies]. The deny-by-default semantics are explicit: "Since Access is deny by default, users who do not match a Block policy will still be denied access unless they explicitly match an Allow policy" [@cloudflare-access-policies]. Cloudflare also ships a policy tester that lets administrators dry-run a policy against the existing user population [@cloudflare-access-policy-mgmt].

Zscaler Private Access is a broker-based ZTNA: the user connects to a Zscaler edge node, the broker establishes a connection to the private app, and "users never access the corporate network, and apps are never exposed to the public internet" [@zscaler-zpa]. Zscaler's own marketing surveys put the VPN-replacement framing in numbers: "91% of organizations are concerned that VPNs compromise their security" and "56% of organizations suffered one or more VPN-related attacks in 2023-2024" [@zscaler-zpa].

Architecturally, Cloudflare Access and ZPA both sit closer to BeyondCorp than to Microsoft CA: the policy engine is in the data path; the protected resource is fronted by the proxy rather than gated at token issuance.

OpenID Shared Signals Framework and CAEP

Not a competitor: the cross-vendor wire format for what Microsoft built into CAE. On 22 September 2025, the OpenID Foundation approved three Final Specifications: the Shared Signals Framework 1.0, the Continuous Access Evaluation Profile 1.0, and the Risk Incident Sharing and Coordination Profile 1.0 [@helpnet-2025-openid][@openid-caep-final]. CAEP defines five event types -- Session Revoked, Token Claims Change, Credential Change, Assurance Level Change, Device Compliance Change -- as the cross-vendor revocation vocabulary.

Microsoft's CAE implementation is, in Microsoft's own words, "an industry standard based on Open ID Continuous Access Evaluation Profile" [@ms-cae-concept]. The Final Specifications from September 2025 are the canonical post-2025 reference; older drafts at OpenID's site are superseded.

Head-to-head comparison

The differences worth memorising:

System	Enforcement point	Native risk feed	Post-issuance revocation	Gates M365 first-party?	Best suited for
Microsoft Entra CA + ID Protection + CAE	Token issuer + CAE-aware resource APIs	ID Protection ML pipeline	CAE up to 15 min, instant for IP	Yes	M365 tenants
Google IAP / Chrome Enterprise Premium	HTTPS reverse proxy	Context-aware access signals	Per-request (always re-decides)	No	Google Cloud workloads
Okta Identity Engine + ThreatInsight	IdP token issuance	ThreatInsight IP feed	Limited, IdP-dependent	No	Vendor-neutral front door
AWS IAM Identity Center + Verified Access	Verified Access proxy + IAM	Trust providers (third-party)	Per-request for Verified Access	No	AWS-hosted apps
Cloudflare Access	Edge proxy	Risk score + identity factors	Per-request	No	Public web apps
Zscaler Private Access	Broker / edge node	Posture + identity	Per-request	No	Private app access

Per-cell sourcing for the table: the Microsoft row's "Yes" cell on M365 first-party gating is the directly-stated claim from the Microsoft Learn CA overview [@ms-ca-overview]. The other rows' "No" cells are negative inferences drawn from each peer's own product documentation, none of which advertises Microsoft 365 first-party API gating: Google IAP gates HTTPS-fronted apps behind the proxy [@google-iap]; Cloudflare Access deny-by-default applies to the apps fronted by Cloudflare [@cloudflare-access-policies]; Verified Access "evaluates each application access request" for HTTPS apps behind AWS [@aws-verified-access]; Zscaler ZPA brokers private app access [@zscaler-zpa]; Okta sign-on policies gate apps wired into Okta's IdP [@okta-sign-on-policies]. The cell semantics are "does the system gate Outlook/Teams/SharePoint/Graph first-party traffic" and the answer is structurally No outside Microsoft.

flowchart LR subgraph TOK[Token issuance model Microsoft Okta] U1[User] --> AT[Acquire token] AT --> CA1[CA evaluator] CA1 --> IS[Issue token] IS --> R1[Resource API validates token] R1 -. CAE 401 .-> AT end subgraph PRX[Data path proxy model Google BeyondCorp AWS Verified Access Cloudflare Zscaler] U2[User] --> PXY[Proxy intercepts every request] PXY --> POL[Policy evaluator at the proxy] POL --> BCK[Backend application] end

The honest observation worth sitting with: none of the proxy systems gates M365 first-party API traffic. Outlook, Teams, SharePoint, and Microsoft Graph route through Entra. For those workloads, Entra remains the only effective policy plane. The proxy systems gate the apps that sit behind the proxy -- internal apps, partner-facing apps, custom workloads. That makes BeyondCorp, Okta, Cloudflare Access, and ZPA complementary to Entra CA in an M365 environment, not substitutes for it.

Six systems, six architectural choices. None of them wrong. But what do they all leave on the table?

8. What Conditional Access fundamentally cannot do

Section 7 cannot be the ending. There are at least five things Conditional Access -- and every peer in Section 7 -- cannot do. Some are engineering limits; some are theorems. Both classes are worth naming.

(a) On-prem authentication

CA is a cloud control plane. Kerberos and NTLM against on-prem domain controllers do not consult Entra. There is no policy hook for the legacy Windows protocols. If a domain user signs in to a domain-joined workstation, authenticates to a file server, and accesses a share, no piece of that flow touches Conditional Access. The Microsoft Learn overview is explicit about the scope [@ms-ca-overview].

This is the operational seam between cloud identity and on-prem identity. State it plainly; do not soften.

Note: Conditional Access does not gate Kerberos or NTLM against on-prem domain controllers. If your threat model includes lateral movement after credential theft on the on-prem side, CA is not your defence. Layer in Defender for Identity, on-prem MFA gateways, or a privileged-access workstation architecture instead.

(b) Post-issuance token theft

Once a refresh token is exfiltrated -- whether via an adversary-in-the-middle phishing kit like Evilginx [@ms-aitm-phishing-blog], an infostealer that scrapes the token cache, or a malicious browser extension -- the pre-issuance CA evaluation is bypassed. The attacker has a bearer token. They can present it to the resource API directly. CAE-aware resource providers can revoke mid-session on the published critical-event list, but the latency ceiling is "up to 15 minutes" for non-IP events [@ms-cae-concept]. In fifteen minutes a competent attacker has done plenty.

The mitigation is device-bound credentials: Primary Refresh Tokens bound to TPM hardware, FIDO2 with hardware attestation, certificate-based authentication with hardware-protected keys [@ms-prt-concept]. A bearer token bound to a TPM is not exfiltratable in the same way; the wrapped key material never leaves the device.

(c) Consent-grant phishing

CA evaluates authentication, not authorization grants that a user makes to a malicious OAuth app. A user who clicks "Allow" on a permissions-consent prompt for an attacker-controlled app has performed an OAuth authorization, not a sign-in. The malicious app now has the user's delegated permissions for whatever scopes were granted. CA was not invoked because CA gates the user's sign-ins; it does not inspect the user's OAuth grants. Microsoft Defender for Cloud Apps documents the attack class as "risky OAuth apps" and ships investigation and remediation tooling on a separate plane from CA [@ms-illicit-consent-grant].

Admin consent settings, app governance policies, and explicit allow-listing of acceptable publishers live on that different plane. The policy admin who deploys CA needs to deploy app governance separately.

(d) Risk evaluation is probabilistic

Identity Protection produces a score, not a proof. A "high" risk level is a confidence; it is not the assertion "this sign-in is definitely an attack." No vendor in the Section 7 survey publishes precision or recall numbers for its risk engine. The operating point -- the threshold that maps a continuous score to discrete buckets -- is a trade-off that the vendor calibrates and the customer does not see.

This is a structural lower bound on any ML-driven risk plane, not a Microsoft-specific failure. Any classifier has false positives and false negatives. A risk-aware CA policy that says "block at high risk" will, with non-zero probability, block a legitimate sign-in. A policy that says "require MFA at medium risk" will, with non-zero probability, let through a sophisticated attacker whose detections fall under the threshold.

(e) Workload-identity CA is constrained by design

Block-only grants. No managed identities. No group assignments. The full human grant taxonomy does not transfer because a service principal cannot perform an MFA challenge, cannot register a FIDO2 key, cannot accept a terms-of-use document. The Microsoft Learn page on workload-identity CA enumerates the constraints precisely [@ms-workload-identity-ca]. Section 9 will name this as an open problem; for now, treat it as a documented limit.

The theorems behind the limits

Some of these limits are engineering choices that could be different in a future product. Some are deeper.

Saltzer and Schroeder 1975 [@saltzer-schroeder-1975] give the upper bound on aspirations: complete mediation across every authentication and authorization decision within scope of mediation. The principle does not constrain what is in scope. It constrains what you must do for whatever you have decided is in scope. On-prem AD is out of scope for CA by Microsoft's product decision; complete mediation cannot fix that, because the principle is about consistency within the boundary, not about expanding the boundary.

Harrison-Ruzzo-Ullman 1976 -- usually shortened to HRU [@harrison-ruzzo-ullman-1976] -- gives the lower bound on static analysis. The safety question in the general access-matrix model is undecidable. In informal terms: there is no general algorithm that proves a Conditional Access policy edit cannot, under some future edit chain, leak a sensitive right. This is why every vendor in the survey relies on evaluation-time mediation (the engine decides at the moment of the request) rather than static-proof analysis (the engine certifies in advance that no edit can ever leak). Cedar's intentional restriction to a decidable fragment, in AWS Verified Access, is the counter-strategy: trade expressiveness for analysability.

The bearer-token revocation trade-off is informal but real: the worst-case revocation latency is bounded below by the token's natural lifetime, unless a side channel exists. CAE is that side channel. Its latency is bounded by the propagation time of the channel (up to 15 minutes for non-IP events, instant for IP). Shorten the channel further and you discover that the IdP-to-resource-API event delivery has its own infrastructure costs.

The practical implication of HRU for a CA admin is that there is no tool, anywhere, that can examine your tenant's CA policies and certify that no sequence of policy edits could ever leak access to a sensitive resource. Vendors offer policy *testers* that simulate a single edit against the current population; that is decidable. The question "is the system safe under all possible future edits?" is not. This is why audit trails, change-control gates, and least-privilege role assignments on the CA admin role matter as much as the CA policies themselves.

Naming the limits clears the way to name the active unsolved problems -- the ones the field is still working on, where the current state of the art admits it is partial.

9. Where the policy plane is still incomplete

Microsoft's own 2026 documentation for Conditional Access on AI agents calls the current implementation "a lightweight enforcement mechanism designed to block unauthorized or risky agents, not a full policy suite." That is not marketing modesty. It is an admission that the most active frontier of policy enforcement -- agent identities -- is deliberately under-specified.

Five open problems sit on that frontier in 2026.

Organizations are expanding Zero Trust across more users, applications, and now a growing population of AI agent identities ... the Conditional Access Optimization Agent moves beyond static guidance to continuous, context-aware identity posture optimization. [@ms-techcom-ca-optimization-agent]

9.1 Agent identity policy semantics

What grants should exist for AI agents beyond block and allow? Useful candidate grants include: "read-but-not-move" for mail or files; "business-hours-only"; "any autonomous action requires a fresh sign-off from the on-behalf-of human." None of these exist as first-class CA grant types in 2026.

What does exist: CA targeting of agent identities -- the ability to match a policy on the agent identity rather than the human -- and the Conditional Access Optimization Agent, which gives administrators continuous recommendations on policy posture [@ms-techcom-ca-optimization-agent]. The targeting is there. The grant taxonomy is still mostly the human one, applied imperfectly.

9.2 Cross-vendor CAEP interop

The wire format was finalised in September 2025 [@helpnet-2025-openid][@openid-caep-final]. Production receiver coverage outside Microsoft Entra-internal resource providers is partial. Two large vendors agreeing on an event schema is necessary but not sufficient for cross-vendor revocation to work in practice; the receiving side needs to act on the events. The next eighteen months are the period in which CAEP either becomes the cross-vendor wire format for revocation, or it does not.

9.3 Workload-identity grant set

What richer expressions could exist for non-human identities? The current Microsoft Learn page lists workload-identity detections: investigationsThreatIntelligence, suspiciousSignins, adminConfirmedServicePrincipalCompromised, leakedCredentials, maliciousApplication, suspiciousApplication, anomalousServicePrincipalActivity, suspiciousAPITraffic [@ms-workload-identity-risk]. The detections exist; the grant taxonomy stops at block.

Candidate richer grants: "workload attestation" (the service principal proves it is running on attested infrastructure), "verifiable claim from a trusted attester" (a third party signs a statement about the workload), "step-up authorization for sensitive scopes" (a higher-privilege scope requires a separate per-request authorization step). None of these is generally available in 2026.

A non-human identity in Entra ID: a service principal, an application registration's owned service principal, or a managed identity in Azure. Workload identities authenticate via client secrets, client certificates, federated credentials, or (for managed identities) instance-metadata-service tokens. Conditional Access for workload identities currently applies only to single-tenant service principals registered in the tenant; it does not cover multi-tenant SaaS apps or managed identities [@ms-workload-identity-ca].

9.4 The break-glass paradox

Emergency-access accounts must be excluded from CA. If a CA misconfiguration locks out every admin, the break-glass account is the recovery path. But exclusion creates a high-value bypass: an attacker who compromises a break-glass account inherits its exclusion.

There is no clean answer. Microsoft's guidance is exclusion plus FIDO2 binding plus alerting: the break-glass accounts have hardware-bound FIDO2 keys (so they cannot be phished), they are excluded from all CA policies (so misconfiguration cannot lock them out), and every sign-in is alerted on (so misuse is detected within minutes) [@ms-emergency-access].

Run two break-glass accounts, not one. Store the FIDO2 keys in separate physical safes under separate custodians. Never use them for anything but a recovery exercise once per quarter; if they sign in unexpectedly, treat the alert as a P1 incident. The operational pattern accepts that you have a bypass and treats the bypass as the highest-value alert in the tenant [@ms-emergency-access].

9.5 The risk-engine transparency problem

No vendor in the Section 7 survey publishes model architecture, feature vector size, or per-detection precision and recall. Microsoft does not. Okta does not. Google does not. Defenders, auditors, and regulators must accept a black-box score.

This matters in three places. First, for incident response: when an "atypical travel" detection fires for an executive, the responder cannot see which features contributed and how strongly. Second, for compliance: an auditor asked to evidence the effectiveness of the control plane gets the operating output (3-tier risk levels) but not a quantitative evaluation. Third, for the risk-engine vendors themselves, who must respond to legitimate regulatory questions about model bias and operational reliability without revealing the architecture that attackers would use to evade detection.

The article does not predict a resolution. It names the gap.

The architecture is incomplete by admission. It is also actionable today. A competent tenant administrator can deploy a sensible baseline in an afternoon.

10. Using Conditional Access today

The architectural story ends; the operational story begins. Here is what a competent tenant looks like in 2026.

The licensing reality

Conditional Access is not a feature every Microsoft 365 tenant gets. It is a feature gated by SKU. The licensing tiers are:

Entra ID Free. Security Defaults only [@ms-security-defaults]. No Conditional Access policies. No risk-based conditions. No CA-driven CAE (the critical-event-evaluation subsystem -- for events like account disable, password reset, and high user risk -- still propagates to CAE-aware M365 services at the service layer regardless of SKU; see Section 6.6) [@ms-cae-concept].
Entra ID P1. Conditional Access is unlocked [@ms-ca-overview]. You can author policies with any of the non-risk conditions: users, apps, locations, devices, client app, platform. You can demand any of the non-risk grants.
Entra ID P2. Adds risk-based conditions. signInRiskLevels and userRiskLevels become usable [@ms-id-protection-overview]. ID Protection's full report pane (risky users, risky sign-ins, risk detections) is accessible. The legacy ID-Protection-side risk policies retire 1 October 2026 [@ms-id-protection-policies].
Workload Identities Premium. A separate SKU. Unlocks CA scoped to service principals [@ms-workload-identity-ca].

This corrects a premise discarded earlier: "Conditional Access is the policy plane every M365 tenant runs on" is not true. Many tenants run on Security Defaults. The "policy plane every tenant runs on" is the cloud sign-in pipeline; CA is the configurable richer layer that P1+ tenants opt into.

Start with the managed baselines

Microsoft-managed Conditional Access policies are the recommended starting point [@ms-managed-policies]. They auto-deploy in Report-only mode, run for at least 45 days while administrators review the impact in the Sign-in logs, and are auto-enabled with a 28-day pre-enablement notification unless administrators opt out [@ms-managed-policies]. The currently shipping baselines, per Microsoft Learn, include:

MFA for admins accessing Microsoft admin portals (the most-privileged roles).
MFA for users who already have per-user MFA enabled (a migration aid).
MFA and reauthentication for risky sign-ins (the P2 baseline).
Block legacy authentication.
Block access for high-risk users (P2-tier protection on the user-risk surface).
Block all high-risk agents accessing all resources (Preview, AI-agent surface).

The original announcement called for a 90-day report-only window [@weinert-2023-managed-policies][@helpnet-2023-microsoft-entra-policies]. The current default is 45 days [@ms-managed-policies]; the window shrank as Microsoft gained confidence that customers were not surprised by the auto-enablement.

Five custom policies on top of the baselines

Beyond the managed policies, every well-run tenant in operational experience runs five custom policies on top of the baselines [@ms-ca-policy-common]: block legacy authentication unconditionally [@ms-managed-policies]; require the phishing-resistant Authentication Strength for any user in a privileged role [@ms-auth-strengths]; require compliantDevice for admin centres, finance apps, and customer-data exports [@ms-intune-compliance-partners]; restrict privileged sign-ins to a named-location allow-list with block-or-step-up outside it [@ms-ca-network]; and, where Entra ID P2 is licensed, demand a sign-in-risk-based step-up (MFA at high risk, a passwordless or phishing-resistant method at medium risk) [@ms-id-protection-policies].

Note: 1. Block legacy authentication. 2. Phishing-resistant Authentication Strength for admin roles. 3. Require compliant device for sensitive applications. 4. Named-location restrictions for privileged roles. 5. Sign-in-risk-based step-up where Entra ID P2 is available.

Automation entry points (Microsoft Graph)

The Graph endpoints administrators care about:

GET /identity/conditionalAccess/policies -- list policies. POST to create, PATCH to update [@ms-graph-capolicy].
GET /identityProtection/riskDetections -- the per-detection log. Filterable by riskLevel, riskState, userPrincipalName, activityDateTime [@ms-graph-riskdetection].
GET /identityProtection/riskyUsers -- the per-user risk view.

A policy authored in code looks like this (truncated for readability):

{
  "displayName": "Require phishing-resistant for admins",
  "state": "enabledForReportingButNotEnforced",
  "conditions": {
    "users": { "includeRoles": ["62e90394-69f5-4237-9190-012177145e10"] },
    "applications": { "includeApplications": ["All"] }
  },
  "grantControls": {
    "operator": "OR",
    "authenticationStrength": { "id": "00000000-0000-0000-0000-000000000004" }
  }
}

The recommended deployment dance is enabledForReportingButNotEnforced first; let the Sign-in log show you the impact for a calibration window; promote to enabled only after the report-only data matches expectations [@ms-ca-report-only].

Audit-time visibility

Three surfaces matter:

Sign-in logs in the Entra portal show the per-sign-in evaluation, including which CA policies matched and which grants were satisfied.
Risk-detection log in Identity Protection (P2 only) shows the per-detection narrative: which riskEventType fired, with what additionalInfo, against which user.
The What-If tool simulates a policy evaluation for a hypothetical sign-in, before you enable a policy.

Detection engineering

For E5 tenants, the Sign-in logs and risk detections flow into Microsoft Sentinel (via the Microsoft Entra ID connector) or Defender XDR [@ms-sentinel-aad-connector]. A KQL skeleton for high-risk-with-CA-failure looks like:

SigninLogs
| where ResultType != 0
| join kind=inner (AADRiskDetections | where RiskLevel == "high") on UserPrincipalName, CorrelationId
| project TimeGenerated, UserPrincipalName, IPAddress, ConditionalAccessStatus, RiskEventType, FailureReason

The aggregate scale figure is worth remembering: Microsoft processes "more than 100 trillion security signals" daily across all identity products [@ms-managed-policies]. The detection engineer is consuming a small slice that landed in their tenant.

Run the following in Microsoft Sentinel or the Entra advanced hunting blade to surface sign-ins that succeeded *despite* a high-confidence risk detection -- the most operationally interesting subset. The query is original to this article; the schema it targets is the canonical Microsoft Sentinel Entra ID connector tables `SigninLogs` and `AADRiskDetections` [@ms-sentinel-aad-connector], and the join-and-filter pattern follows the practice documented in Microsoft's Sentinel hunting guidance [@ms-sentinel-hunting].

let window = 7d;
SigninLogs
| where TimeGenerated > ago(window)
| where ResultType == 0
| where ConditionalAccessStatus == "success"
| join kind=inner (
    AADRiskDetections
    | where TimeGenerated > ago(window)
    | where RiskLevel == "high"
) on UserPrincipalName, CorrelationId
| project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, RiskEventType, ConditionalAccessPolicies
| order by TimeGenerated desc

The expected count for a well-tuned tenant is small. Spikes warrant a P2 investigation.

Break-glass

Two emergency-access accounts. FIDO2-bound. Excluded from every CA policy. Stored as separate hardware tokens in separate safes. Every sign-in is wired to a P1 alert. Per Section 9.4 and Microsoft Learn's emergency-access guidance, this is the acknowledged operational compromise to the break-glass paradox [@ms-emergency-access].

A non-personal Entra ID administrator account excluded from Conditional Access and from MFA enforcement, used only when the primary identity infrastructure has failed. Best practice: at least two such accounts, with hardware FIDO2 keys stored separately, monitored by an unconditional alert on any sign-in.

The article has answered "who decided?" five times over: by signal, by policy, by token, by session, by operational pattern. One section remains: the misconceptions that keep recurring.

11. Misconceptions that recur

Every time these questions come up in practice, the same wrong answers come back. The corrections are worth memorising.

Only if you have Entra ID P1 or higher and have configured CA policies. Free SKU tenants run Security Defaults, which is a coarse tenant-wide on/off switch, not CA [@ms-security-defaults]. CA is unlocked at P1 [@ms-ca-overview]; risk-based conditions are unlocked at P2 [@ms-id-protection-overview]. The "every tenant runs on CA" framing you sometimes see in marketing material is incorrect. No. CA is a cloud control plane. Kerberos and NTLM against on-prem domain controllers do not consult Entra at all [@ms-ca-overview]. If your threat model includes on-prem lateral movement, layer in Defender for Identity and the standard on-prem hardening playbook. No. CAE is event-driven push from the policy plane to CAE-aware resource APIs. The Microsoft Learn CAE document gives the latency ceiling precisely: "the goal for critical event evaluation is for response to be near real time, but latency of up to 15 minutes might be observed because of event propagation time; however, IP locations policy enforcement is instant" [@ms-cae-concept]. There is no 30-second poll. The token can live up to 28 hours because the revocation is event-driven. No. Clients advertise CAE-readiness via the `cp1` client capability in token requests, specifically by adding `cp1` to the `xms_cc` claim mechanism (or by calling `WithClientCapabilities(new[] { "cp1" })` in MSAL) [@ms-claims-challenge][@ms-app-resilience-cae]. The Microsoft Learn claims-challenge page is explicit: "The only currently known value is `cp1`" [@ms-claims-challenge]. The CAE-aware token is recognisable by its long lifetime (up to 28 hours) and by the resource API's willingness to issue an `insufficient_claims` challenge, not by a Boolean claim. No. Third-party MDM compliance partners can write the device compliance state into Entra via Intune's compliance-partner API [@ms-intune-compliance-partners]. The CA grant reads `isCompliant` on the device object; it does not care which MDM wrote that value. Microsoft's preferred deployment is Intune, but the integration point is open by design. In 2023. The public preview of CA filters for workload identities opened on 26 October 2022 [@vansurksum-2022-workload-ca]; the Microsoft Entra Workload Identities standalone product reached GA in late November 2022, and the Conditional Access feature itself reached general availability later in 2023 [@ms-workload-identity-ca]. Any article asserting a 2025 GA date for workload-identity CA is incorrect. No. Every sign-in produces a Sign-in log entry; ID Protection emits a `riskDetection` only when at least one detector fires for that sign-in [@ms-graph-riskdetection]. Most sign-ins produce no `riskDetection`. Detection engineers querying for risk should join the Sign-in log with the riskDetections log and treat unjoined rows as "no risk flagged at the moment." No Microsoft primary source publicly describes the production model architecture or names a per-sign-in feature-vector size. What is published is the detection taxonomy (about two dozen named `riskEventType` values [@ms-id-protection-risks][@ms-graph-riskdetection]), the timing split (real-time / near-real-time / offline [@ms-risk-detection-types]), and the three-tier risk output. The "transformer with 80+ signals" framing is folk knowledge with no Microsoft primary source behind it. The article reframes it as "ML-based with detailed architecture publicly undisclosed." Not on its own. A standard MFA grant does not defeat a kit like Evilginx, which proxies both the password and the MFA challenge in real time. The defence is to require the *phishing-resistant Authentication Strength* in CA: FIDO2 with hardware attestation, Windows Hello for Business, or multifactor certificate-based authentication [@ms-auth-strengths]. The cryptographic origin-binding in WebAuthn-class credentials defeats AitM by construction. But the defence only works *when the grant is applied*. A CA policy that demands phishing-resistant for admin roles but not for users will block AitM against admins and not against users.

12. Two planes, one boundary

Replay Alice's Tuesday.

Identity Protection's signal plane scored her 09:02 sign-in. The score was below the medium-risk threshold. Conditional Access's policy plane evaluated four matching policies. Two demanded MFA; her cached refresh token already satisfied that grant from yesterday. One demanded a compliant device; Intune had marked her laptop compliant overnight. None demanded the block grant. The token issuer issued a CAE-aware bearer token with a 28-hour lifetime. Exchange Online accepted the token. Outlook's data path opened. Bytes returned to Alice.

If, twelve minutes later, an attacker tries to sign in with Alice's credentials from an anonymizing proxy, ID Protection will fire a detection. The detection will lift her user risk to high. CAE will deliver the high-user-risk event to Exchange. Exchange will issue a claims challenge on the next call from Alice's Outlook. Outlook will replay the challenge to Entra. Entra will re-run CA, see the elevated risk, demand step-up MFA, and either issue a fresh token (after Alice satisfies the step-up) or refuse.

The modern identity boundary is not a wall. It is a conversation between planes.

Key idea: The boundary is a conversation between planes, not a wall.

The open frontier is real. Agent identities want a richer grant taxonomy than the human one provides. Cross-vendor CAEP wants production receivers outside Microsoft. Workload-identity policy wants grants that go beyond block. The break-glass paradox wants an answer that does not depend on operational discipline. None of these problems will resolve in 2026. They are the next frontier.

What the reader should now be able to do: trace a sign-in through the signal, policy, token, and session planes; read a conditionalAccessPolicy JSON and predict the evaluation outcome; identify which class of attack each grant defends against; and name, by reference to specific Microsoft Learn pages, what CA does not defend against. The promise from Section 1 is delivered.

Today, 100 percent of consumer Microsoft accounts older than 60 days have multifactor authentication. -- Alex Weinert, Microsoft Identity, November 2023 [@weinert-2023-managed-policies]

Who decided this token is good? The boundary itself decided, by composing the work of every plane named above.