Parag Mali - tag: identity-security

The 28-Hour Bargain: How Continuous Access Evaluation Made Long-Lived Tokens Safe

noreply@paragmali.com (Parag Mali) — Sat, 30 May 2026 00:00:00 GMT

**Microsoft Entra Continuous Access Evaluation (CAE) lets access tokens safely live up to 28 hours.** It works by maintaining a push-subscription channel between Entra and Microsoft 365 resource providers, so that when a user is disabled, has their password reset, or has MFA enabled, the resource provider rejects the next request with a `401` and a claims challenge -- typically within 15 minutes for critical events, instantly for IP-location changes [@ms-cae-concept]. The same pattern was standardized by the OpenID Foundation on September 2, 2025 as SSF 1.0, CAEP 1.0, and RISC 1.0 Final Specifications [@openid-three-final-specs], opening the door to vendor-neutral cross-SaaS revocation. CAE does **not** solve token theft (use DPoP for that) and does **not** cover Microsoft Defender for Endpoint or Intune as resource providers (they are signal sources into Conditional Access, not CAE consumers).

1. Your Fired Employee Is Still Reading Email

09:00 Tuesday. The administrator disables the account at 09:01. At 09:23, the ex-employee's open Outlook for the Web tab refreshes -- and pulls down new mail. This is not a bug. This is RFC 6749 working exactly as designed. Until Microsoft Entra shipped a fix that took ten years and three standards bodies -- the IETF, the OpenID Foundation, and NIST -- to develop, the access token that user held at 09:00 stayed cryptographically valid until 10:00 at the latest, and there was nothing Conditional Access could do about it [@rfc-6749].

The window has a name now. It did not, for most of cloud identity's history. Microsoft's own documentation calls it "the lag between when conditions change for a user, and when policy changes are enforced" [@ms-cae-concept]. Between sign-in (Conditional Access territory) and the next token refresh (refresh-token territory) sits a stretch of time in which Conditional Access decisions have no enforcement surface. That stretch ranged from 60 minutes to 24 hours, depending on tenant configuration. For every OAuth 2.0 deployment from 2012 onward, this was the security debt the industry carried.

Note: "Microsoft Entra ID" is the rebranded name for what most engineers learned as "Azure Active Directory" or "Azure AD." Microsoft announced the rename in July 2023 [@ms-entra-rename-2023]; the underlying service, tenants, app registrations, and APIs are unchanged. Throughout this article, "Entra" and the older "Azure AD" refer to the same identity platform.

This article explains the engineering pattern that lets a Microsoft 365 tenant do two things that look contradictory at the same time: extend access-token lifetime from 1 hour to up to 28 hours, and revoke a disabled user's session in under 15 minutes [@ms-cae-concept]. The reconciling idea is a near-real-time push channel between the identity provider (Entra) and a small set of cooperating resource providers. When you can revoke a token in minutes rather than waiting for it to expire, expiry stops doing the security work, and the token can live as long as the user actually needs it.

Microsoft Entra's push-subscription channel between the identity provider and cooperating resource providers (Exchange Online, SharePoint Online, Teams, and Microsoft Graph). CAE lets a resource provider revoke an already-issued access token in near-real-time -- up to 15 minutes for critical events, instantly for IP-location changes -- without waiting for the token to expire [@ms-cae-concept].

The trade has a price. The 15-minute critical-event service-level objective is the price the channel pays for fanning out events across hyperscale Microsoft 365 infrastructure. Sub-second revocation is possible -- other vendors demonstrate it at smaller scales -- but at Exchange-Online volume, 15 minutes is the engineering economics. We will earn that number by Section 8.

For now: the OAuth 2.0 designers knew about this gap when they wrote RFC 6749 in 2012. They chose it on purpose. To see why, and to see why the obvious patches all failed, we have to walk back to the moment the trade was made.

2. The Static-Expiry Compromise

In October 2012, Dick Hardt of Microsoft published RFC 6749 -- The OAuth 2.0 Authorization Framework -- as the editor of record for an IETF working group that had spent five years arguing about it [@rfc-6749]. Section 1.4 carries one of the most consequential adjectives in cloud-identity history. Access tokens, it says, are credentials "usually with a short lifetime" used by the client to access a protected resource. The word usually is doing heavy lifting. Nothing in the protocol enforces it. Nothing in the protocol provides revocation. Nothing in the protocol stops a server from issuing 24-hour bearer tokens that, once minted, stay cryptographically valid until they expire on their own.

This was a deliberate trade. To see why it was rational, remember what came before.

Web Access Management: the model OAuth replaced

The pre-2012 enterprise-identity pattern in which every protected HTTP request synchronously queried a central policy decision point. Strength: instant revocation, because every request consulted authoritative state. Weakness: a chatty bottleneck that did not scale to cloud volumes and could not federate trust across organizations.

Web Access Management dominated enterprise identity from the late 1990s into the early 2010s. Every protected HTTP request to a WAM-fronted application made a synchronous round-trip to a Policy Decision Point. The PDP held authoritative session and policy state. Revoke a user? The next request failed, immediately, because the PDP said no. No token-lifetime window. No gap between policy change and enforcement.

WAM was correct. WAM was also unworkable for the web that was coming. It did not scale: every request was a network hop. It did not federate: cross-organization SaaS meant the PDP could not live inside any one company's network. And it required every protected resource to participate in a single trust domain. By the time enterprises were running cross-organization SaaS at scale, the WAM model had run out of road.

The OAuth 2.0 authors made the opposite trade. Replace the chatty PDP round-trip with a self-contained signed bearer token -- a JWT the resource server validates locally. Validation becomes O(1) cryptographic verification with no round-trip. Throughput scales horizontally. Federation works, because the JWT carries its own attestation of the issuer. Revocation becomes...approximated. By expiry. The token is valid until it isn't, and you trust that the lifetime is short enough.

For a 2012 web of forum logins and consumer mashups, "short enough" was a defensible answer. For a 2020 enterprise running compliance-bound SaaS across thousands of employees, it was not.

The Zero Trust pressure

Two intellectual pressures forced the question. The first came from Google. In December 2014, Rory Ward and Betsy Beyer published BeyondCorp: A New Approach to Enterprise Security in USENIX ;login: [@ward-beyer-2014-beyondcorp].Beyer would later co-author Site Reliability Engineering (O'Reilly, 2016); BeyondCorp came out of the same Google culture of evidence-driven infrastructure engineering. The argument was philosophical: a session is not a one-shot decision at sign-in. It is a time-varying authorization. Trust signals -- device posture, network location, behavioral risk -- change continuously, and the access decision should change with them. BeyondCorp was not a CAE implementation; it predates the term. But it planted the seed that login-time enforcement was not enough.

The second pressure was bureaucratic. In August 2020, NIST published Special Publication 800-207, Zero Trust Architecture, by Scott Rose, Oliver Borchert, Stu Mitchell, and Sean Connelly [@nist-sp-800-207]. SP 800-207 codified the BeyondCorp philosophy as U.S. federal guidance. One sentence made the engineering investment commercially rational: "Authentication and authorization (both subject and device) are discrete functions performed before a session to an enterprise resource is established." A federal mandate for continuous re-evaluation pushed every cloud vendor with U.S. government contracts to find an implementation. The gap RFC 6749 had left was now a procurement problem.

A name for the problem

The third moment named the gap. On February 21, 2019, Atul Tulshibagwale, then an engineer at Google, published Re-thinking federated identity with the Continuous Access Evaluation Protocol on the Google Cloud blog [@tulshibagwale-2019-google-blog]. The post introduced a term -- CAEP -- and a framing: publish-and-subscribe between identity providers and resource providers, as a third option between WAM's per-request chattiness and OAuth's fire-and-forget expiry. We return to Tulshibagwale's actual proposal in Section 5. For now what matters: 2019 was the year the industry got a vocabulary for a problem it had been carrying for seven years.

The OpenID Foundation working group that grew out of Tulshibagwale's proposal was originally chartered as the Shared Signals & Events (SSE) working group. It was renamed Shared Signals in subsequent years, but older industry write-ups from 2020-2022 still use the SSE abbreviation [@idsalliance-2022-11-cae].

gantt title CAE and Shared Signals timeline (2012-2025) dateFormat YYYY-MM axisFormat %Y section IETF standards RFC 6749 OAuth 2.0 :done, a1, 2012-10, 30d RFC 7009 Token Revocation :done, a2, 2013-08, 30d RFC 7662 Token Introspection :done, a3, 2015-10, 30d RFC 8417 SET :done, a4, 2018-07, 30d RFC 8935 SET Push :done, a5, 2020-11, 30d RFC 8936 SET Poll :done, a6, 2020-11, 30d section Zero Trust thinking BeyondCorp paper :done, b1, 2014-12, 30d NIST SP 800-207 Final :done, b2, 2020-08, 30d section CAEP origin and OIDF Tulshibagwale CAEP post :done, c1, 2019-02, 30d OIDF Shared Signals WG :done, c2, 2019-09, 30d SSF 1.0 CAEP 1.0 RISC 1.0 :done, c3, 2025-09, 30d section Microsoft Entra CAE Limited preview Weinert :done, d1, 2020-04, 30d Expanded preview Simons :done, d2, 2020-10, 30d General Availability :done, d3, 2022-01, 30d

The OAuth 2.0 designers traded revocation latency for throughput on purpose [@rfc-6749]. Once that gap proved unacceptable, three obvious patches were tried. None of them worked. To see why none of them worked is to understand the negative space CAE was designed to fill.

3. Three Patches, Three Failures

Between 2013 and the late 2010s, the OAuth community published three patches for RFC 6749's revocation gap. Each was rationally adopted; each was rationally abandoned at hyperscale. This section is the genealogy of those failures, because what each one got wrong defines the shape of the design that finally worked.

Patch 1: RFC 7009 -- the `/revoke` endpoint (August 2013)

In August 2013, Torsten Lodderstedt of Deutsche Telekom, Stefanie Dronia, and Marius Scurtescu of Google published RFC 7009, OAuth 2.0 Token Revocation [@rfc-7009]. The contribution was a standardized HTTP endpoint, /revoke, that a client could POST a token to in order to invalidate it. The mental model is the logout button: when a user signs out, the client tells the authorization server "I'm done with this token, please retire it."

The failure mode is in the threat model. RFC 7009 is client-initiated. The token holder asks for revocation. But the scenario that motivates CAE is precisely the one where the token holder is uncooperative. A fired employee will not POST their access token to /revoke on the way out the door. An attacker who has stolen a token will certainly not. The administrator on the other side cannot use the endpoint either, because they do not possess the bearer token.

Worse, RFC 7009's Implementation Note (Section 3) is candid about self-contained tokens: the only standardized recourse is "some (currently non-standardized) backend interaction between the authorization server and the resource server" when immediate revocation is desired [@rfc-7009]. Read that carefully. The spec admits there is no spec. The JWT in flight at the resource server is cryptographically valid until it expires. The authorization server can mark it revoked in a local database, but the resource server never asks. It validates the signature locally. The revocation event never crosses the wire.

RFC 7009 works for opaque tokens with a token-introspection back-channel. It does not, by itself, solve revocation for self-contained JWT bearers -- which by the mid-2010s were the dominant pattern in the cloud.

Patch 2: RFC 7662 -- the `/introspect` endpoint (October 2015)

Two years later, in October 2015, Justin Richer published RFC 7662, OAuth 2.0 Token Introspection [@rfc-7662]. The mechanism: on every request, the resource server calls a /introspect endpoint on the authorization server with the bearer token. The AS replies with the token's current state. If the token has been revoked, /introspect returns active: false, and the resource server denies the request.

This is correct. It also reintroduces the WAM bottleneck that OAuth was designed to escape.

For an AS serving billions of requests per day -- Microsoft Graph as one example, Google's IdP as another -- making /introspect the per-request critical path turns the authorization server into a synchronous dependency on every API call against every resource server in the estate. Latency adds up. Availability becomes shared. If the AS has a bad five minutes, every resource server has a bad five minutes simultaneously. The architecture OAuth bought with self-contained tokens -- resource server scales independently of AS -- gets traded back for exactly the WAM property that motivated OAuth's existence.

RFC 7662 introspection is alive and well. It remains the right choice for opaque-token systems and on-premises IdPs where the resource server count is small, the per-request latency budget is generous, and the AS is well within capacity. The criticism here is structural and only applies at hyperscale public-cloud volumes. RFC 7662 was not killed by RFC 7009 or by CAE; it is a parallel path that continues to serve a substantial fraction of the deployed OAuth surface.

Patch 3: Make the token life so short revocation does not matter

The third patch was the obvious one. If you cannot revoke a token mid-life, make its life short. Issue access tokens with a minutes-long lifetime, the way early Microsoft experiments did. The revocation window collapses. Problem solved.

Microsoft tried it. The retrospective is unusually candid. On April 21, 2020, Alex Weinert, then Director of Identity Security at Microsoft, published Moving towards real time policy and security enforcement on the Azure Active Directory Identity Blog [@weinert-2020-04-real-time]. (The original lives at post ID 1276933 on Microsoft's tech community; the full body is preserved in Microsoft's Japanese translation on the jpazureid GitHub mirror [@jpazureid-blog-1-japanese].) The post names the failure mode in one sentence:

"We have experimented with the "blunt object" approach of reduced token lifetimes but found they can degrade user experiences and reliability without eliminating risks." -- Alex Weinert, Microsoft, April 21, 2020 [@weinert-2020-04-real-time]

Two things break. First, user experience and reliability. Every short-lifetime boundary forces every active client to round-trip the IdP for a fresh token. For Outlook, Teams, Word Online, OneDrive, and every other client an enterprise user has open at once, that is a wave of token requests per user per cycle. Multiplied by Microsoft 365 active users, the load profile creates real outages. Network blips that would otherwise be invisible surface as failed refreshes, with user-visible re-authentication prompts. Second, it does not eliminate the risk. A minutes-long window is still a window. A fired employee can read or exfiltrate a great deal of email in that window. You have paid the full user-experience cost and still left a non-trivial breach surface.

This was the third failure. The negative space across the three patches defines the shape any real solution has to take: it must be server-initiated (not RFC 7009), it must be push-based rather than per-request poll (not RFC 7662), and it must separate revocation from expiry so the IdP does not pay for every revocation with a refresh-load spike (not the short-lifetime patch). The three failures exhaust the surface of the obvious fix.

Note: Each of the three patches fails for a different reason; together they rule out everything except server-initiated push subscription that decouples revocation from expiry.

If the patches all fail, the next move has to be architectural. The first published statement of that architecture was Atul Tulshibagwale's February 2019 Google blog post -- and the move he proposed is the one Microsoft would ship three years later.

4. Four Generations of Session Enforcement

Walk forward through the genealogy of session enforcement and the breakthrough in Section 5 stops looking like a stroke of genius and starts looking like the only move the design space had left. Four generations, each killed by a documented limit of the previous one.

Generation 0: WAM (pre-2012)

Per-request synchronous round-trip to a Policy Decision Point. Instant revocation; chatty bottleneck; no federation. Killed by cloud-scale request rates and the rise of cross-organization SaaS, where the protected resource and the policy authority no longer lived in the same trust domain. WAM remains valuable in single-tenant enterprise contexts, but for the public-cloud API mesh it cannot scale.

Generation 1: Static-expiry JWT (2012-2020)

Self-contained signed bearer tokens validated locally at the resource server. Revocation approximated by expiry per RFC 6749 [@rfc-6749]. Throughput scales; federation works; revocation is acceptable when the lifetime is short and the threat model is benign. Killed by (a) the fired-employee window, (b) the three failed Section 3 patches, and (c) the philosophical pressure from Zero Trust to treat sessions as continuously re-evaluated.

Generation 2: Microsoft CAE (limited preview April 2020, GA January 10, 2022)

The first production solution. Limited preview launched in April 2020 with Alex Weinert's Moving towards real time policy and security enforcement announcement [@weinert-2020-04-real-time]. Expanded public preview October 2020 [@simons-2020-10-expanded-preview; @vansurksum-2020-10-10]. General Availability January 10, 2022, announced by Alex Simons, Corporate VP for Program Management in the Microsoft Identity Division [@simons-2022-01-ga-rss].

The architecture is a private push-subscription channel between Entra and a small set of Microsoft 365 resource providers, with a wire-level handshake (the claims challenge) for telling the client to re-acquire a token reflecting new state. Access-token lifetime extends from the default 1 hour to up to 28 hours specifically for CAE-aware sessions [@ms-cae-concept]. We will unpack the mechanism in Section 5.

The Gen-2 limitation that motivated Gen 3: the wire format is Microsoft-internal. A SaaS vendor that wants the same revocation properties for its own resource provider cannot use Microsoft's CAE channel. The protocol does not federate.

Generation 3: OpenID SSF 1.0 + CAEP 1.0 + RISC 1.0 (Final Specifications, September 2, 2025)

The OpenID Foundation generalized the Microsoft pattern into a vendor-neutral specification. On September 2, 2025, three Final Specifications were approved: the Shared Signals Framework 1.0 (SSF), the Continuous Access Evaluation Profile 1.0 (CAEP), and the Risk and Incident Sharing and Coordination 1.0 (RISC) [@openid-three-final-specs; @openid-sharedsignals-wg].

The wire envelope is IETF RFC 8417's Security Event Token (SET), published in July 2018 by Phil Hunt (Oracle), Michael Jones (Microsoft), William Denniss (Google), and Morteza Ansari (Cisco) [@rfc-8417]. A SET is a signed JWT carrying a single security event. The transport layer is RFC 8935 push (POST over TLS from transmitter to receiver) and RFC 8936 poll (recipient-initiated retrieval), both published November 2020 by Annabelle Backman and collaborators [@rfc-8935; @rfc-8936]. SSF defines the subscription model -- streams, subjects, transmitter and receiver metadata endpoints. CAEP and RISC define the vocabulary of events that can ride that envelope.

IETF RFC 8417's standardized signed-JWT envelope for transmitting security-relevant events between systems. Each SET carries exactly one event with a well-defined event-type URI; the envelope is signature-protected and timestamp-bearing. SET is the wire format underlying CAEP, SSF, and RISC, as well as Microsoft's internal CAE protocol [@rfc-8417].

RFC 8417 was a cross-vendor IETF effort that pre-dated the OpenID Shared Signals working group by a year. Phil Hunt was at Oracle; Michael Jones at Microsoft; William Denniss at Google; Morteza Ansari at Cisco. The envelope-only design -- leaving event vocabularies to higher-layer profiles -- is what allowed both Microsoft's internal protocol and the OpenID profiles to converge on the same wire format without coordination [@rfc-8417].

flowchart TD L4["Layer 4: Event vocabularies
CAEP 1.0 (session) and RISC 1.0 (account)"] L3["Layer 3: Subscription and stream model
OpenID SSF 1.0"] L2["Layer 2: HTTP transport
RFC 8935 push, RFC 8936 poll"] L1["Layer 1: Signed event envelope
RFC 8417 Security Event Token (SET)"] L4 --> L3 L3 --> L2 L2 --> L1

The generation chain has a documented engineering reason for each transition. The comparison matrix below pulls the essentials together.

Approach	Year	Revocation latency	Strengths	Weaknesses
WAM (Gen 0)	pre-2012	Instant	Authoritative state, instant enforcement	No federation, per-request bottleneck
Static-expiry JWT (Gen 1)	2012-2020	Up to token lifetime (1h-24h)	O(1) RP validation, federation works	No revocation; fired-employee window
Short-lifetime patch	mid-2010s	Minutes	Conceptually simple	Load amplification, window remains, UX degradation
RFC 7662 introspection	2015 onward	Instant	Standardized, works for opaque tokens	AS becomes per-request critical path
Microsoft CAE (Gen 2)	2020-2022	Up to 15 min critical; instant IP	Push, decoupled from request rate, long tokens safe	Microsoft-internal protocol; tiny RP set
OpenID SSF/CAEP (Gen 3)	2025 onward	Vendor-dependent	Vendor-neutral standard, cross-SaaS	Receiver adoption still early

flowchart LR G0["Gen 0: WAM
per-request PDP"] G1["Gen 1: Static-expiry JWT
RFC 6749 (2012)"] G2["Gen 2: Microsoft CAE
GA January 2022"] G3["Gen 3: OpenID SSF and CAEP
Final September 2025"] G0 -- "cloud scale and federation" --> G1 G1 -- "fired-employee window, patches fail" --> G2 G2 -- "Microsoft-only, no cross-SaaS" --> G3

Knowing the lineage is not knowing the trick. What is the actual mechanism CAE deploys -- the thing that turns this standards-history arc into a feature that ships and makes 28-hour tokens defensible? It has three parts, and once you see them together, you understand why long tokens are safe.

5. Subscription, Claims Challenge, Extended Lifetime

Three innovations, none new in isolation, all unprecedented in combination. This is the section where you see the trick.

Atul Tulshibagwale's 2019 framing names the move: "Our vision for continuous access evaluation is based on a publish-and-subscribe ('pub-sub') approach... It's complementary to federated or cert-based authentication... It's not as chatty as WAM... It doesn't impact latency for user access" [@tulshibagwale-2019-google-blog]. Pub-sub is the third option between WAM's per-request chattiness and RFC 6749's fire-and-forget. Subscription is the channel; claims challenge is the wire-level handshake; extended lifetime is the user-experience prize.

Part 1: Subscription

Microsoft's CAE concept page describes the architecture in one sentence that rewards close reading:

Timely response to policy violations or security issues really requires a 'conversation' between the token issuer Microsoft Entra, and the relying party (enlightened app). -- Microsoft Learn, *Continuous access evaluation in Microsoft Entra* [@ms-cae-concept]

The word conversation is the architecture. The relying party (a CAE-aware Microsoft 365 workload such as Exchange Online) subscribes to a finite, documented set of critical events for the subjects it cares about. Entra pushes events to the RP as state changes. State is cached at the RP. On the hot path -- the per-request data plane -- the RP does an O(1) JWT signature verification plus an O(1) hash-table lookup of cached revocation state. No back-channel round-trip on the hot path. The 28-hour token costs no more to validate than the 1-hour token it replaced [@ms-cae-concept].

This is the move that defeats RFC 7662. The state lives at the RP, not at the AS. The control-plane cost scales with the rate of events, not the rate of requests. Push, not poll.

Part 2: The claims challenge

When state at the RP changes -- because a push event has arrived saying "this user's password has been reset" -- the RP cannot reach into a request that has already been accepted and is being served. CAE is in-band with the next request, not the current one. The next time the client presents the stale token, the RP rejects it with HTTP 401 and a specific header:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer error="insufficient_claims",
                  claims="eyJhY2Nlc3NfdG9rZW4iOnsiYWNyc..."

The claims parameter is a base64url-encoded JSON object that tells the client what to re-acquire from the IdP. The Microsoft Authentication Library (MSAL) on the client decodes the challenge transparently and requests a new access token from Entra with the indicated claims. Entra either issues a fresh CAE-aware token (if authorization still holds) or rejects, forcing interactive re-authentication. The client retries the original API call with the new token [@ms-cae-app-resilience].

The HTTP-level mechanism by which a CAE-aware resource provider signals to a client that the presented token must be re-acquired with fresh state. The challenge is conveyed as a `WWW-Authenticate: Bearer error="insufficient_claims"` header with a base64url-encoded `claims` parameter; current Microsoft Authentication Library (MSAL) releases decode and handle it automatically when the client app registration declares the `xms_cc` capability `["cp1"]` [@ms-cae-app-resilience].

This is the move that defeats RFC 7009. Revocation is initiated by the resource provider's view of the IdP's state, not by the token holder. A fired employee's client cannot opt out of the claims challenge; the RP will not serve any further request until a fresh token arrives that reflects the post-revocation state.

{` // A real-shape WWW-Authenticate header from a CAE-aware resource provider. // The 'claims' parameter is base64url-encoded JSON. const header = 'Bearer error="insufficient_claims", claims="eyJhY2Nlc3NfdG9rZW4iOnsibmJmIjp7ImVzc2VudGlhbCI6dHJ1ZSwgInZhbHVlIjoiMTcyMDQ4MDA0MyJ9fX0="';

// Extract the claims parameter const match = header.match(/claims="([^"]+)"/); const b64 = match ? match[1] : null;

// base64url decode (Node 'Buffer' would work; here we use the browser-safe approach) function b64urlDecode(s) { s = s.replace(/-/g, '+').replace(/_/g, '/'); while (s.length % 4) s += '='; return atob(s); }

const claimsJson = b64urlDecode(b64); console.log(JSON.parse(claimsJson)); // { // "access_token": { // "nbf": { // "essential": true, // "value": "1720480043" // } // } // } // MSAL reads this and requests a new token whose 'nbf' (not-before) is at least // the supplied timestamp -- i.e., a token issued after the state change. `}

The nbf (not-before) claim challenge is the most common shape: the RP is telling the client "give me a token issued after this moment." The client requests one. Entra checks current state -- did the user get disabled? did the password get reset? did the risk score elevate? -- and either issues or denies. The wire format is simple enough to inspect in a browser tab, which is part of why the architecture has been able to standardize: there is no magic to reverse-engineer.

Part 3: Extended lifetime, the prize

The first two parts buy you the third. Once revocation is push-based and the claims challenge gives the RP a way to evict stale tokens within seconds of seeing a control-plane event, the expiry timer stops carrying the security weight. Tokens can live longer because the expiry is no longer the only revocation mechanism.

Microsoft documents the upper bound as "up to 28 hours" for CAE-aware sessions [@ms-cae-concept; @ms-cae-app-resilience]. The default for non-CAE-capable clients remains 1 hour. This is the move that defeats the short-lifetime patch: the IdP load profile collapses because tokens refresh once a day, not on a per-minute cycle, and the revocation window is dramatically smaller -- not because expiry shrank, but because the channel now does the revocation work expiry used to do.

Key idea: Long-lived access tokens are safe only when paired with a near-real-time revocation channel. CAE is the channel. Subscription provides the push, the claims challenge is the in-band handshake the push enables, and the 28-hour lifetime is what the channel buys -- not what the channel costs.

The full round trip

The three parts interlock. The complete flow, from a state change at Entra to a re-validated request, runs end-to-end through every layer the article has named.

sequenceDiagram participant Admin participant Entra as Microsoft Entra participant Client as Client (MSAL) participant RP as Resource Provider (e.g. Exchange Online) Admin->>Entra: Disable user account Entra->>RP: Push critical-event SET (account disabled) Note over RP: Updates cached revocation state for (sub, tenant) Client->>RP: GET /me/messages (Authorization Bearer old token) Note over RP: Validates JWT signature O(1), checks cached state RP-->>Client: 401 plus WWW-Authenticate insufficient_claims Note over Client: MSAL parses claims challenge from header Client->>Entra: Token request with claims Note over Entra: Checks current user state, account is disabled Entra-->>Client: 400 invalid_grant or interactive re-auth required Note over Client: User cannot recover, session terminates

Three moves, one design. Remove any one and the system collapses. Subscription without a claims challenge gives you push events the RP cannot act on at the wire. Claims challenge without subscription gives you a 401 mechanism with no information to decide when to fire it. Extended lifetime without either gives you Generation 1's fired-employee window. The 28-hour token is not the cost of CAE; it is what CAE purchases.

This is the design. What does it actually do in production today, and where does it stop?

6. CAE as Deployed in Microsoft Entra (2026)

Concrete answers to concrete questions. Which events trigger CAE? Who participates? What is the actual SLA? How long do tokens actually live? No marketing language; only what Microsoft Learn currently documents.

Critical event evaluation events

Microsoft Learn lists exactly five events that drive critical event evaluation at the IdP-to-RP boundary [@ms-cae-concept]:

A user account is deleted or disabled.
A password for a user is changed or reset.
Multi-factor authentication is enabled for the user.
An administrator explicitly revokes all refresh tokens for a user.
High user risk is detected by Microsoft Entra ID Protection.

These five events propagate from Entra to the participating CAE-aware resource providers via the push channel. Microsoft's published service-level objective is "up to 15 minutes" for critical-event propagation [@ms-cae-concept]. That is not the same as "instant." The phrase to avoid is "CAE delivers instant revocation"; the accurate phrase is "CAE delivers near-real-time revocation, typically within 15 minutes for critical events."

A separate scenario -- Conditional Access policy evaluation -- covers network and IP-location changes. Here the SLA is different: IP-location enforcement is instant per Microsoft's published documentation [@ms-cae-concept]. The difference is mechanical. IP location is a property the RP sees directly on every request (the source IP of the incoming HTTP connection); the RP can compare it against the location constraints attached to the session and reject locally with no propagation delay. Critical events have to travel from Entra to the RP through the event channel, and that travel has a 15-minute budget at Microsoft 365 scale.

Event	Source	Propagation	Notes
Account deleted or disabled	Entra ID directory	Up to 15 min	Honored by Exchange Online, SharePoint Online, Teams, Graph (CA)
Password changed or reset	Entra ID directory	Up to 15 min	Same RP set
MFA enabled for user	Entra ID directory	Up to 15 min	Same RP set
All refresh tokens revoked (admin)	Entra ID admin action	Up to 15 min	Same RP set
High user risk detected	Entra ID Protection	Up to 15 min	SharePoint Online does not honor user-risk events [@ms-cae-concept]
IP location changed (CA policy)	Resource-provider observation	Instant	Conditional Access policy evaluation path; strict location enforcement [@ms-strict-location-enforcement]

Note: Microsoft Defender for Endpoint and Microsoft Intune (MDM) are signal sources into Conditional Access. They contribute to the risk score and device-compliance state that drive CA policy decisions, but they are not CAE-consuming resource providers. They do not subscribe to Entra critical-event notifications and they do not enforce the claims-challenge handshake on token-bearing requests. The CAE-aware RP set is exactly: Exchange Online, SharePoint Online, Microsoft Teams, and Microsoft Graph (the last only for Conditional Access policy evaluation) [@ms-cae-concept]. If you read older deck slides or vendor blog posts that list MDE or Intune as CAE participants, they are conflating the signal-source role with the resource-provider role.

The SharePoint Online user-risk caveat is a concrete example of why "CAE-aware" is not a binary property at the workload level. SharePoint Online is fully CAE-aware for the first four critical events on the list; it just does not subscribe to user-risk events specifically. The lesson is that you must read the per-workload documentation carefully when designing controls that depend on a specific event's enforcement [@ms-cae-concept].

Workloads that participate

The CAE-aware resource-provider set, per Microsoft Learn [@ms-cae-concept]:

Exchange Online -- full CAE consumer (initial implementation, October 2020).
SharePoint Online -- full CAE consumer, with the user-risk caveat noted above.
Microsoft Teams -- full CAE consumer (initial implementation), per Alex Simons's January 2022 GA announcement [@simons-2022-01-ga-rss].
Microsoft Graph -- consumes Conditional Access policy evaluation events (the IP-location instant path); narrower scope than the M365 productivity workloads.

Client-side support is also explicit. Microsoft's compatibility tables in the CAE concept page enumerate which client and server combinations are Supported, Partially supported, or Not Supported on every major operating system and form factor [@ms-cae-concept]. Office web apps against SharePoint Online and Exchange Online are documented as Not Supported on several combinations; every Teams client surface shows as Partially supported. The point is not that CAE is broken on these surfaces -- it is that Microsoft documents the rough edges in primary source, and tenant administrators who care about specific scenarios must read the table.

Tokens and clients

The default access-token lifetime for CAE-aware sessions is up to 28 hours; the default for non-CAE-capable clients remains 1 hour [@ms-cae-concept; @ms-cae-app-resilience]. Client support requires a current Microsoft Authentication Library (MSAL) release on the target platform: the 4.x line for .NET and JavaScript; the appropriate current line for Python, Java, Android, iOS, or macOS, per each SDK's own release stream. Microsoft Learn's Use Continuous Access Evaluation enabled APIs page enumerates per-SDK guidance [@ms-cae-app-resilience]. The app registration must also declare the xms_cc client capability with value ["cp1"] to advertise CAE-handling support to the IdP [@ms-cae-app-resilience].

An app-registration claim by which a client advertises support for CAE-aware token issuance. The canonical wire-level value in the issued JWT is lowercase `"cp1"` (Microsoft's developer docs show both `"cp1"` and `"CP1"`; negotiation is case-insensitive but the token claim is lowercase). It signals that the client's MSAL implementation can decode and act on a `WWW-Authenticate: Bearer error="insufficient_claims"` response by parsing the `claims` parameter and re-acquiring a token. Without it, Entra issues the default 1-hour token and the resource provider falls back to standard expiry [@ms-cae-app-resilience]. A Microsoft 365 workload (Exchange Online, SharePoint Online, Teams, or Microsoft Graph for Conditional Access policy) that consumes Entra's critical-event notifications and enforces them on subsequent token-bearing requests via the claims-challenge handshake. This is a narrower meaning than the generic OAuth 2.0 sense of "resource server"; in CAE, "resource provider" specifically means a workload that has implemented the CAE participation contract with Entra [@ms-cae-concept]. Microsoft documents an *upper bound* on token lifetime. The actual lifetime issued for any given session is variable and can be shorter. CAE-aware sessions can also be refreshed silently as long as the channel signals nothing has changed. Practically, this means most users with CAE-aware clients on M365 productivity workloads almost never see an interactive re-authentication prompt during normal working hours [@ms-cae-concept].

A migration note for older tenants

Tenant administrators with Conditional Access policies that pre-date GA may carry legacy "strict location enforcement" preview settings. Microsoft has since migrated the feature into GA, and the current Microsoft Learn page Strictly enforce location policies using continuous access evaluation documents the post-migration configuration model [@ms-strict-location-enforcement]. Administrators should verify their policies after each major Conditional Access feature wave to ensure preview-to-GA migrations have been picked up.

CAE is one approach among several. Where does it sit relative to introspection-per-request, identity-aware proxies, DPoP, and the cross-vendor OpenID standard? The design space is small enough to map cleanly.

7. Competing Approaches and Their Relation to CAE

Five named methods occupy adjacent positions in the design space. Some compete; some compose. The map matters because deployments that confuse the two get wrong answers.

CAE versus OpenID SSF and CAEP 1.0

Same architecture, different implementations. Microsoft CAE solves the Microsoft estate via a Microsoft-internal protocol; OpenID SSF and CAEP solve the cross-vendor SaaS long tail via a public standard atop RFC 8417 [@openid-three-final-specs; @openid-ssf-1_0-final; @openid-caep-1_0]. The two are convergent rather than rivalrous: Microsoft is moving toward also acting as an SSF transmitter and receiver alongside its first-party CAE protocol, and other vendors are building SSF receivers that can consume signals from any transmitter, including Microsoft.

The Authenticate 2025 interop event in October 2025 was the first whose tested text was the Final-Specification version of SSF [@openid-authenticate-2025-interop]. Multi-vendor SSF and CAEP interoperability has been demonstrated at successive Gartner IAM Summit interop events as well. At the March 2024 London summit, SGNL's CAEP Hub interoperated as both transmitter and receiver with Cisco Duo, Okta, SailPoint, and Helisoft on the session-revoked CAEP event [@sgnl-2024-04-interop]. Okta's own blog characterizes the March 2025 London summit as "a significant industry shift toward interconnected, real-time security" with "interoperable implementations from pioneers like Okta, Google, IBM, Omnissa, SailPoint, and Thales" [@okta-shared-signals].

Tim Cappalli, who joined Okta after his time at Microsoft, co-chairs the OpenID Shared Signals Working Group alongside Atul Tulshibagwale (SGNL, formerly Google) [@tulshibagwale-sgnl-2023-08-qanda; @openid-sharedsignals-wg]. The cross-vendor co-chair arrangement is part of why the Final Specifications passed without significant vendor pushback: the people doing the standardization had visibility into both Microsoft's and Google's prior implementations.

CAE versus RFC 7662 introspection

Parallel paths, not competitors. RFC 7662 introspection [@rfc-7662] continues to be the right answer for opaque-token systems and on-premises IdPs where the AS-to-RP per-request round-trip is acceptable. CAE wins at hyperscale public-cloud volumes specifically because it inverts the per-request dependency: state pushes to the RP once and lives in cache; the data plane does not consult the AS on every request. If you are building a B2B integration with a small RP count and a few hundred requests per second, RFC 7662 is fine. If you are building Exchange Online, it is not.

CAE versus DPoP and mTLS-bound tokens

Complementary, not competitive. The threat model for CAE is stale authorization: the authorization decision at sign-in is no longer accurate, because the user has been disabled, their password has been reset, their risk score has changed, or their network location has shifted. The threat model for proof-of-possession is stolen tokens: an attacker holding a bearer token that was legitimately issued to a different party.

RFC 9449, OAuth 2.0 Demonstrating Proof of Possession (DPoP), published September 2023 by Daniel Fett and collaborators [@rfc-9449-dpop], binds an access token to a client-held key pair: a DPoP-bound token can only be replayed by an attacker who also stole the private key. RFC 8705, OAuth 2.0 Mutual-TLS Client Authentication and Certificate-Bound Access Tokens, published February 2020 by Brian Campbell and collaborators [@rfc-8705-mtls], does the same thing using mTLS certificates. Both are sender-constrained-token mechanisms; both close the bearer-token-replay attack surface.

CAE does not address token theft. A stolen CAE-aware token is still usable by the attacker until the IdP or RP becomes aware of the compromise. A DPoP-bound CAE-aware token closes both gaps: the attacker cannot replay it, and even if they could, the channel can revoke it within minutes. The correct deployment pattern is to combine CAE with DPoP or mTLS-binding where the application threat model warrants both.

CAE versus BeyondCorp-style identity-aware proxies

Different architectural layer. Identity-aware proxies (Google IAP, Cloudflare Access, AWS Verified Access) sit in front of the resource server and enforce policy at the proxy. They have full visibility into per-request state and can do instant revocation by terminating the connection at the proxy when policy changes. This is correct for proxy-fronted workloads but does not scale to the long tail of API surfaces that cannot or will not sit behind a proxy. CAE pushes the enforcement into the resource server itself, which is what lets it work for native cloud APIs and federated SaaS where the proxy model would not.

A note on PRT theft

CAE does not address attacks at the Primary Refresh Token (PRT) layer. The PRT is a long-lived refresh credential Windows uses to mint access tokens silently from a logged-in session. A stolen PRT can mint CAE-aware access tokens that are, from Entra's perspective, legitimately issued -- the attacker holds a credential the IdP still recognizes. CAE will only catch this if the user is revoked, the password is reset, or one of the other critical events fires after the PRT theft. The Pass-the-PRT attack class therefore bypasses CAE entirely; defenses for that layer are out of scope here and are a separate engineering problem.

Mapping the design space

The table is the cleanest way to see who competes with whom and who composes with whom.

Approach	Solves	Composes with CAE	Competes with CAE
OpenID SSF/CAEP 1.0	Cross-vendor revocation	Yes (CAE is a Microsoft implementation of the same pattern)	No
RFC 7662 introspection	Opaque-token revocation at modest scale	Parallel path	At hyperscale only
DPoP (RFC 9449)	Sender-constrained tokens	Yes (compose for full coverage)	No
mTLS-bound tokens (RFC 8705)	Sender-constrained tokens	Yes (compose for full coverage)	No
Identity-aware proxy	Per-request policy at the proxy edge	Composes for proxy-fronted workloads	Different layer
Short access-token lifetime	Reduces revocation window mechanically	Falls back when CAE not available	Yes, and loses on the trade

The reader who came to this article expecting a binary contest -- "which one wins?" -- has the wrong frame. The actual answer is that CAE is one move in a layered defense, and most production deployments will end up composing it with DPoP or mTLS for token binding, falling back to short lifetimes for non-CAE clients, and continuing to use introspection for opaque-token internal APIs.

That handles deployment. But every architecture has limits. The reader has spent six sections climbing; the next section is the humility beat where the descent begins.

8. Theoretical Limits: What CAE Cannot Do

Every architecture has a floor. The reader has spent six sections climbing; this is where the limits show up -- not as vendor laziness, but as physics, scale, and trust topology.

Limit 1: cannot revoke a token already in flight

Once a request has been accepted and is being served by the resource provider, CAE cannot reach into the RP's execution thread and abort it. The revocation applies to the next request. A long-running operation -- a bulk Outlook export, a large SharePoint upload -- that began at 10:23:00 may complete normally even if the user is disabled at 10:23:01. The revocation takes effect the next time the client presents the token [@ms-cae-concept]. For most use cases the in-flight window is sub-second and the consequence is negligible; for long-running data egress, it matters.

Limit 2: cannot beat the 15-minute critical-event SLA for most events

Microsoft's published SLA is "up to 15 minutes" for critical-event propagation [@ms-cae-concept]. Only IP-location enforcement is instant. The 15-minute number is not a fundamental limit; it is engineering economics at hyperscale. Fanning out an event to every CAE-aware RP for every potentially affected subject across Microsoft 365's global infrastructure is what produces the budget. Smaller-scale deployments demonstrate much better numbers: TigerIdentity's commercial deployment self-reports sub-second end-to-end revocation in a tuned CAEP receiver configuration [@tigeridentity-caep-explained]. The architecture allows sub-second; Microsoft's particular deployment chooses 15 minutes because the alternative at its fan-out scale is prohibitively expensive.

The strict physical floor sits below even the tuned implementations. An RP cannot enforce a revocation it has not yet learned about. The one-way network latency $L$ between IdP and RP sets the absolute minimum: with a transcontinental $L \approx 70,\text{ms}$, no push protocol can revoke faster than that, and pull protocols are necessarily worse. In practice, queuing, scheduling, and event-fanout dominate $L$ at scale -- but the floor remains.

Key idea: The 15-minute SLA is not a fundamental limit; it is engineering economics at hyperscale. Sub-second is feasible at smaller fan-outs, and is the direction of travel as receiver implementations improve and as Microsoft's own event-distribution infrastructure ages well. But the strict physical floor is the network latency between IdP and RP; no cooperative protocol can do better than that.

Limit 3: cannot cover non-CAE-aware clients or resource providers

CAE is a cooperative protocol. Both the client (via the xms_cc=cp1 capability declaration) and the resource provider (via implementing the participation contract) must be CAE-aware [@ms-cae-app-resilience]. A non-CAE client receives a default 1-hour token and never sees a claims challenge; it relies on standard expiry. A non-CAE RP silently falls back to standard token expiry as well; the IdP's events have no consumer. The CAE-aware portion of the estate enjoys the new contract; the rest carries the old security debt unchanged.

This is why audit posture matters. A tenant administrator who wants to argue that revocation latency for their workforce is "under 15 minutes" must be able to demonstrate that the client and RP combinations the workforce actually uses are CAE-aware. Microsoft's compatibility tables [@ms-cae-concept] document several Office-web-app and OneDrive-Win32-versus-SharePoint combinations as Not Supported or Partially supported; those gaps are part of the tenant's effective revocation profile, not someone else's problem.

Limit 4: cannot help if the resource provider itself is compromised

Revocation state lives at the RP. A compromised RP can simply ignore revocation events: keep serving requests against tokens Entra has signaled are invalid; misreport its own subscription state; drop events on the floor. CAE is a cooperative protocol between trustworthy parties. It is not a defense against an RP that has been pwned. The OpenID SSF specification addresses this implicitly by defining receiver requirements (verification events, stream-control endpoints, signature verification on SETs), but no receiver requirement can compel a compromised receiver to obey the protocol.

The threat model implication: an attacker who has compromised an RP does not need to bypass CAE. They simply do not implement it from the inside, and the protocol's design has no remedy. RP integrity is a prerequisite, not a guarantee.

Limit 5: cannot revoke a stolen PRT before it mints a new access token

As noted in Section 7, the Primary Refresh Token sits outside CAE's scope. A stolen PRT mints new CAE-aware access tokens that Entra treats as legitimately issued, because from Entra's perspective they are legitimately issued -- the attacker is presenting a credential the IdP recognizes. CAE catches PRT theft only when one of the five critical events fires after the theft. If the attacker exfiltrates a PRT, refreshes a token, and immediately uses it, the access token is valid and the revocation channel has nothing to revoke.

The SharePoint Online user-risk-event caveat is a useful concrete example of the per-feature limit pattern. Even within the four CAE-consuming RPs, feature support is not uniform; you cannot reason about CAE as a single boolean property at the workload level. Every event you care about must be checked against the specific RP that will enforce it [@ms-cae-concept].

The bounded design space

Put together, the five limits draw the perimeter of what CAE can do. It cannot stop in-flight requests. It cannot beat network latency at the strict floor or 15 minutes at Microsoft's chosen operating point. It cannot help non-participating clients or RPs. It cannot fix a compromised RP. It cannot revoke PRT-layer credentials before they mint new tokens. The honest summary is that the design space is bounded -- the reader who internalizes the five limits has a calibrated sense of what is fundamentally possible, and can stop expecting CAE to be a single fix for revocation in all situations.

The limits also map the open frontier. If those are the structural constraints, what are the OpenID Foundation and the SaaS long tail working on in 2026?

9. Open Problems (2026)

Final Specifications are necessary but not sufficient. CAEP 1.0, SSF 1.0, and RISC 1.0 were approved on September 2, 2025 [@openid-three-final-specs]. The question for 2026 is what adoption and extension look like. Five live problems.

1. Third-party SaaS receiver-adoption depth

The Final Specifications give every SaaS vendor a clean target to build against. The question is whether they will. Google Workspace shipped its SSF receiver in Closed Beta, supporting only the session-revoked CAEP event at launch [@google-workspace-ssf-api]. That is one event out of CAEP 1.0's eight. The SaaS long tail -- Workday, ServiceNow, GitHub Enterprise, Atlassian, Salesforce -- has not, as of the Final Specification's first anniversary, shipped public receivers.

For the "fired employee with N SaaS apps" scenario to be fully solved, every SaaS app in the user's bundle has to be a CAEP receiver subscribed to events from the enterprise IdP. The architecture is in place; the integration work is per-vendor and per-customer. This is the largest single determinant of CAE's real-world value over the next several years.

Note: The Microsoft 365 estate enjoys near-complete CAE coverage because Microsoft built both the IdP and the resource providers. The cross-vendor story is fundamentally a coordination problem: every receiver has to be built, deployed, and configured to subscribe to events from every transmitter the enterprise uses. SSF 1.0 makes the integration tractable; it does not make the work disappear. Watch receiver coverage in 2026-2028 as the leading indicator of CAE's industry-wide impact.

2. CAE for non-human and agent identities

CAEP subject identifiers assume user-shaped or device-shaped subjects [@openid-caep-1_0]. Workload identities, service principals, and emerging AI-agent identities sit outside the model as currently profiled. An agent acting on behalf of a user, with its own identity and its own session, is not yet covered by a Final-Specification profile. The Microsoft Entra Conditional Access for Agent Identities workstream is a documented Microsoft Learn surface as of 2026 [@ms-conditional-access-agent-id] and is one of the workstreams that will eventually produce a CAEP profile for non-human subjects, but as of mid-2026 the cross-vendor standardization gap is open.

3. Cross-IdP federation of SSF streams

When tenant A federates to tenant B, the event-flow path crosses a trust boundary the current Final Specifications do not explicitly profile. If a user is disabled in tenant A's IdP, how does the revocation event reach the resource providers downstream in tenant B? The pieces -- transmitter, receiver, SET envelope, signed events -- are all in place; what is missing is the canonical profile for cross-IdP federation of SSF streams. This is a 2026-2027 OpenID Foundation workstream rather than a Final-Specification gap.

4. Bidirectional signal sharing

Today's CAE and CAEP deployments are largely IdP-as-transmitter, RP-as-receiver. The full vision is bidirectional: an RP that detects anomalous behavior (unusual access patterns, suspected automation, post-authentication risk signals) should be able to transmit those signals back to the IdP, which can then incorporate them into the next authorization decision. SGNL and similar vendors are building toward this model. The Final Specifications support bidirectional flow at the protocol level; the policy and operational pieces -- who trusts whom, what events flow which way, how an IdP weighs signals from an RP -- are still being worked out.

5. Reason-code convergence between CAEP and RISC

CAEP 1.0 and RISC 1.0 cover overlapping ground around credential mutation. CAEP defines a credential-change event; RISC defines account-credential-change-required [@openid-caep-1_0; @openid-sharedsignals-wg]. Implementers must choose, and vendor extensions proliferate where the spec leaves room. Reason-code convergence between the two profiles is incomplete; some receivers will subscribe to both streams to be safe, others will pick one and hope upstream transmitters agree. Over time the WG will likely consolidate; for 2026, the practical guidance is to support both event vocabularies in receiver code.

The first interoperability event whose tested text was the Final-Specification version of SSF took place at Authenticate 2025 in San Diego, October 13-15, 2025, hosted by the FIDO Alliance and coordinated by the OpenID Foundation Shared Signals Working Group [@openid-authenticate-2025-interop]. The event required that all participants with an SSF Transmitter pass the OpenID Foundation's free, open-source conformance tests. This was the fourth in a series of Gartner-IAM and Authenticate interops since March 2024, and the first conducted after SSF 1.0 was approved Final on September 2, 2025. The list of vendor participants has grown at each event; cross-vendor receiver coverage is the metric to watch.

Given all this -- the architecture, the limits, the open frontier -- what should you actually do this week in your tenant and your code?

10. Turning CAE On in Your Tenant and Your Code

Three audiences, three checklists. Each section is what an engineer in that role needs to confirm or change to make CAE work in their environment.

For the tenant administrator

CAE has been auto-enabled by default for new Microsoft Entra tenants since the January 2022 GA [@simons-2022-01-ga-rss]. Tenants created before then may need to verify enablement in Conditional Access -> Session controls -> Customize continuous access evaluation. The relevant signals to check:

CAE enablement state. Confirm that the tenant-wide CAE policy is set to Enabled rather than Disabled or Strict location.
Per-policy disable flags. Some legacy CA policies carry per-policy CAE overrides. Audit any that explicitly disable CAE; the right default is to honor it.
Strict location enforcement migration. Tenants with pre-GA "strict location enforcement" preview settings should verify that the policy has migrated to the current GA configuration model documented in Microsoft Learn [@ms-strict-location-enforcement].
Audit log baselines. Sign-in logs surface signInEventTypes with CAE-related entries; refresh-token issuance events and revocation events appear in the Entra ID audit log. Build a baseline before changing policies so you can detect drift.

For the MSAL client developer

The client side has three things to confirm and one thing to test:

MSAL version. Use a current MSAL release on your client platform: 4.x for MSAL.NET and MSAL.js; the appropriate current line for MSAL Python, MSAL Java, MSAL Android, and MSAL for iOS/macOS, per each SDK's own release stream. Microsoft Learn's Use Continuous Access Evaluation enabled APIs page enumerates the per-SDK guidance [@ms-cae-app-resilience]. Earlier major-version lines do not handle the claims challenge transparently.
Capability declaration. The app registration must declare xms_cc with value ["cp1"] (lowercase is the canonical token-claim form; uppercase "CP1" also works because negotiation is case-insensitive). This is the wire-level signal to Entra that the client can handle a CAE-aware token and the claims challenge that comes with it.
Claims-challenge handling. MSAL helpers do this transparently in current SDK versions, but custom HTTP pipelines that bypass MSAL must implement the WWW-Authenticate: Bearer error="insufficient_claims" response handler manually. Decode the claims parameter (base64url), pass it to AcquireTokenInteractive or the equivalent, retry the original request with the new token.
End-to-end test. Trigger an admin password reset against a test user in a non-production tenant and verify that the next API call from a signed-in MSAL session surfaces the claims challenge and recovers cleanly. This is the single most useful confidence test; it exercises every layer of the protocol in one round trip.

{` // Illustrative: inspect an MSAL JS token-cache entry for the xms_cc capability // marker. In real apps, MSAL handles capability negotiation; this is for // educational inspection only.

// A real-shape AccessTokenEntity from MSAL JS cache const tokenEntity = { homeAccountId: 'abc.def-tenant', environment: 'login.microsoftonline.com', credentialType: 'AccessToken', clientId: '11111111-2222-3333-4444-555555555555', tenantId: 'tenant-id', target: 'User.Read Mail.Read', // expiresOn is up to ~28 hours after cachedAt for CAE-aware sessions cachedAt: '1748534400', expiresOn: '1748635200', // 28h later extendedExpiresOn: '1748635200', // Capability declaration the app advertised at acquisition time requestedClaims: { xms_cc: ['cp1'] } };

const ttlSeconds = parseInt(tokenEntity.expiresOn) - parseInt(tokenEntity.cachedAt); const ttlHours = ttlSeconds / 3600; const isCaeAware = tokenEntity.requestedClaims && tokenEntity.requestedClaims.xms_cc && tokenEntity.requestedClaims.xms_cc .some(c => c.toLowerCase() === 'cp1');

console.log('TTL hours:', ttlHours.toFixed(1)); console.log('CAE-aware:', isCaeAware); // TTL hours: 28.0 // CAE-aware: true // A TTL above ~1 hour with xms_cc cp1 is a strong indicator the session is // CAE-aware and Entra issued an extended-lifetime token. `}

For the custom-API author

This is the hardest path. To make a custom protected API a CAE-aware resource provider today, the first-party Microsoft pathway is not publicly available -- the CAE participation contract for the M365 productivity workloads is internal to Microsoft. The community-canonical implementation pattern is Damien Bowden's damienbod/AspNetCoreMeIDCAE reference repository on GitHub [@damienbod-aspnetcoremeidcae], with an accompanying blog post walkthrough [@damienbod-blog-2022-04]. The repository (initial version April 3, 2022; updated through .NET 10 in late 2025) demonstrates:

The xms_cc=cp1 capability declaration on both the client and the API app registrations.
The Microsoft.Identity.Web claims-challenge handling on the API side.
The Razor Page client flow that catches a 401 with the challenge header and re-acquires the token.

For a fully standards-track pathway, the same custom API can be built as an OpenID SSF receiver consuming CAEP events from any SSF-compliant transmitter, using the RFC 8417 SET envelope over the RFC 8935 push transport [@rfc-8417; @rfc-8935]. Production-grade SSF receiver code is now available in commercial CAEP Hub products (SGNL, TigerIdentity) and a growing set of open-source libraries.

Note: CAE itself does not require add-on licensing for the basic critical-event evaluation across Microsoft 365 -- it is part of the Entra ID baseline for new tenants. The Microsoft Entra ID Protection feed that drives high user risk detected events, however, requires Microsoft Entra ID P2 (or an equivalent SKU that includes Identity Protection). Confirm current licensing terms in the Microsoft licensing documentation before making procurement decisions; the lower SKUs cover four of the five critical events but not the risk-based one [@ms-cae-concept].

Observability

Sign-in logs and audit logs are where CAE behavior shows up. Look for:

Sign-in logs: filter by signInEventTypes containing CAE-related entries. CAE-aware sign-ins have a different telemetry shape than non-CAE sign-ins.
Token-issuance events: refresh-token issuance against CAE-aware app registrations should show the extended lifetime.
Audit log revocation entries: administrator revocation actions and Identity-Protection-driven revocations appear here; cross-correlate with the resource-provider-side telemetry to validate end-to-end propagation.

Use Microsoft Graph PowerShell to enumerate the tenant's CAE configuration and then trigger a synthetic test: 1) read `Get-MgIdentityConditionalAccessPolicy` to verify the relevant CA policies have CAE enabled in their `SessionControls.ContinuousAccessEvaluation` block; 2) create a test user, sign them in via Outlook on the Web; 3) reset their password via `Update-MgUser`; 4) observe in the audit log that the password reset propagates to a CAE event, and verify in Outlook on the Web that the next refresh surfaces a re-authentication prompt within the 15-minute SLA. This is the simplest end-to-end confidence test that does not require modifying any production resource.

Defaults are good

The most common engineering recommendation here is to leave the defaults alone. CAE on, default tenant settings, current MSAL clients, xms_cc=cp1 on every new app registration. The configuration surface area is small precisely because the design is right: there are not many knobs to turn. The work is in confirming that the client and RP combinations your users actually exercise are CAE-aware, and in monitoring the audit logs to catch drift.

That is what to do. The last section is what to remember -- the misconceptions every team carries into a CAE conversation, and the answers that close them.

11. FAQ and Coda

No. The published SLA is up to 15 minutes for the five critical events; only IP-location enforcement is instant. See Section 6 for the mechanical reason for the asymmetry and Section 8 Limit 2 for why 15 minutes is engineering economics rather than a fundamental limit [@ms-cae-concept]. No. CAE addresses *stale authorization* (the original authorization decision is no longer correct), not *stolen tokens* (an attacker is presenting a token that was legitimately issued to someone else). For token theft, use a sender-constrained-token construction: DPoP per RFC 9449 [@rfc-9449-dpop] or mTLS-bound tokens per RFC 8705 [@rfc-8705-mtls]. Both compose cleanly with CAE; a DPoP-bound CAE-aware token is the strongest commonly-deployed combination today, closing both the replay attack surface and the stale-authorization gap. No. SSF 1.0, CAEP 1.0, and RISC 1.0 were approved as OpenID Foundation Final Specifications on September 2, 2025 -- see Section 4 for the standards-stack treatment [@openid-three-final-specs]. No. MDE and Intune are signal sources into Conditional Access, not CAE-consuming resource providers; see the Section 6 Common-misconception callout for the full distinction and the CAE-aware RP set [@ms-cae-concept]. *Not when the resource provider is CAE-aware.* The token lifetime stops carrying the revocation weight; the channel does. A CAE-aware RP can revoke a 28-hour token within 15 minutes of a critical event, which is a strictly better revocation profile than a 1-hour token with no channel (revocable only at the 1-hour expiry boundary in the worst case) [@ms-cae-concept]. *Yes*, however, when the RP is *not* CAE-aware: the token then carries its full lifetime as the revocation window, and longer is worse. The architectural rule: only issue extended-lifetime tokens to clients whose RPs are CAE-aware -- which is exactly what the `xms_cc=cp1` capability negotiation enforces [@ms-cae-app-resilience]. No. CAE is specific to OAuth 2.0 and OpenID Connect access tokens. SAML assertions have their own lifetime and replay-protection model and are not in scope for the CAE participation contract or for the OpenID SSF/CAEP profiles [@ms-cae-concept; @openid-caep-1_0]. If you are still operating SAML-fronted workloads, the analogous design problem (revocation between sign-in and assertion expiry) is solved differently and is largely a per-product implementation question rather than a standards story.

Coda: the bargain

The OAuth 2.0 designers in 2012 took a deliberate trade: short-lived self-contained tokens were the price they paid to escape the WAM bottleneck. The trade was correct for the web they were designing for. It became wrong the moment enterprises ran compliance-bound SaaS at scale on top of those tokens. Three obvious patches were tried -- the /revoke endpoint, the /introspect endpoint, the short-lifetime experiment -- and each failed for a distinct reason: the wrong party initiates revocation; the AS becomes a per-request critical path; expiry as a blunt instrument creates load and reliability problems while still leaving a window.

What replaced them was an architecture that took two facts seriously. First, revocation has to be push from the IdP to the RP -- not pull from RP to AS, not client-initiated POST to /revoke. Second, expiry and revocation can be separated: once the channel handles revocation, expiry can be measured in days rather than minutes. The 15-minute critical-event SLA and the up-to-28-hour token lifetime are two halves of the same bargain. Microsoft Entra ships them together because they only work together; the OpenID Foundation has standardized the same pattern across vendors because the long tail of SaaS faces the same problem.

The architecture is settled; the adoption is in progress. The CAEP, SSF, and RISC Final Specifications give every SaaS vendor a tractable target. The Microsoft 365 estate is already covered. Cross-vendor receiver coverage is the metric that will decide how much of the 2026 enterprise identity surface actually inherits the bargain -- and that, more than any further protocol work, is the story to watch over the next several years.

AD Is a Graph: How BloodHound Made Defenders Think Like Attackers

noreply@paragmali.com (Parag Mali) — Tue, 26 May 2026 00:00:00 GMT

**AD is a graph.** In April 2015 John Lambert named the missing model in two sentences. In August 2016 Andy Robbins, Rohan Vazarkar, and Will Schroeder shipped the tool that made it operational: BloodHound treats Active Directory as a directed graph of privilege relationships, queries it with Neo4j's Cypher, and turns weeks of red-team whiteboard work into a 200 ms shortest-path lookup. By November 2024 Microsoft itself shipped the same mental model as Microsoft Security Exposure Management. This article is half tool history (1998 academic attack graphs through OpenGraph 2025) and half graph-theory exposition: property graphs, BFS shortest paths, the edge taxonomy, and the open problem of *weighting* what is currently treated as one BFS hop per privilege.

1. How do I get from this user to Domain Admin?

In 2014, a red-team analyst with a help-desk account inside a 40,000-user Active Directory forest asks the question every red-team analyst asks: is there a path from here to Domain Admins? The answer takes two analysts five days of PowerView scripts, hand-drawn whiteboard diagrams, and per-host RDP probing. The same question in 2024 is a ninety-character Cypher query that returns in 200 milliseconds.

What happened in those ten years is the story of one sentence -- defenders think in lists, attackers think in graphs -- becoming, in turn, a tool, a discipline, and a Microsoft product.

The 2014 reality was a stack of CSV files. PowerView, the PowerShell enumeration toolkit Will Schroeder first published in August 2014 [@powersploit-repo], could dump every group membership, every Access Control Entry, and every active session from a low-privilege account [@powertools-powerview]. The outputs were rows. Hundreds of thousands of rows. Composing them into a coherent attack path was a job for a marker and a whiteboard, and the join keys were Distinguished Names that wrapped twice across an analyst's notebook page. Five days to map a single 40,000-user forest was not unusual. It was the price of doing business.

The 2024 reality is a query. The analyst loads SharpHound's JSON dump into a Neo4j graph, opens the BloodHound web interface in a browser, types

MATCH p = shortestPath(
  (u:User {name:'HELPDESK@CORP.LOCAL'})-[*1..]->(g:Group {name:'DOMAIN ADMINS@CORP.LOCAL'})
) RETURN p

and clicks Run. The graph renders. The shortest path is highlighted. The pivot points are circled. Time elapsed: 200 milliseconds on the database, plus a second for the browser to draw the SVG [@bloodhound-ce-repo].

Whatever happened in those ten years has to be more than a software release. It has to be a change in how the entire community models the problem. The change has a date and a sentence. Both arrived in April 2015. Let us start with the sentence.

2. From ACL lists to graphs -- why the model matters

An access-control list is not the same as a graph -- and the difference is everything.

Consider five accounts in a hypothetical CORP.LOCAL forest. Bob, a help-desk operator, has been granted ForceChangePassword on Carol's account by a long-departed administrator who once needed to delegate password resets. Carol is a member of Server Operators. Server Operators, by default, can log on locally to Domain Controllers and back up the directory database. The Domain Controller hosts the krbtgt account.

Three rows in three different audit reports. One attack path.

A per-object Access Control List audit looks at Bob's row, sees a ForceChangePassword ACE, and flags it as "an over-broad delegation." It looks at Carol's row, sees that she belongs to Server Operators, and flags it as "a privileged group membership." It looks at the Domain Controller and sees that Server Operators has logon rights, which is the default. Nothing in the audit composes the three facts. Reachability is not a property the report computes.

A set of nodes (vertices) and directed edges (arrows) between them. Each edge points from one node to another. A *path* is a sequence of edges that can be traversed in the direction of their arrows. A node B is *reachable* from a node A if some path leads from A to B. A property of two nodes A and B in a directed graph: B is reachable from A if there exists a sequence of edges leading from A to B, regardless of length. Reachability is fundamentally a graph property and cannot be answered by inspecting any single edge in isolation.

Now draw the same five accounts as a directed graph. Bob is a node. Carol is a node. Server Operators is a node. The Domain Controller is a node. The krbtgt account is a node. There is an edge from Bob to Carol labelled ForceChangePassword. There is an edge from Carol to Server Operators labelled MemberOf. There is an edge from Server Operators to the Domain Controller labelled CanRDP. There is an edge from the Domain Controller to krbtgt labelled DCSync. Trace the arrows. Bob can reach krbtgt.

The graph form makes reachability visually obvious. The list form does not. This is not a presentation difference. It is a data-model difference, and it is the difference that decides whether a tool can answer the question "is there a path from A to B?" at all.

Key idea: Reachability is a property of the graph that the list does not, in general, express. This is not a UX difference. It is a data-model difference, and it is the entire reason BloodHound exists.

The sentence that named this gap appeared on April 26, 2015. John Lambert, then a Distinguished Engineer in the Microsoft Threat Intelligence Center, published a short essay to a personal GitHub repository [@lambert-2015-defenders-lists-attackers-graphs]. No peer review. No formal venue. Two declarative opening sentences that every defender attack-path product since has cited approvingly.

Defenders don't have a list of assets -- they have a graph. Assets are connected to each other by security relationships. As long as defenders use a list and attackers use a graph, attackers win. -- John Lambert, April 26, 2015.

Lambert then enumerated five concrete classes of "security dependencies" that constitute edges in any real network: shared local-admin passwords; logon scripts on file servers; print-driver propagation from print servers; certificate authorities that mint smart-card logon certificates; and database administrators who run code as a privileged DB process. The essay closed with a defender prescription: "The first step is to visualize your network by turning your lists into graphs."

If the diagnosis is so obvious in retrospect, why was every mainstream AD audit tool from 2000 through 2015 list-shaped? The answer is that the academic literature had the right model but the wrong substrate, and the operator community had the right substrate but the wrong model. The two did not meet until April 2015.

3. Attack graphs before BloodHound, 1998 to 2015

The phrase attack graph is older than Active Directory itself.

In September 1998, two researchers at Sandia National Laboratories presented a paper titled A Graph-Based System for Network-Vulnerability Analysis at the New Security Paradigms Workshop in Charlottesville, Virginia. The authors, Cynthia Phillips and Laura Painton Swiler, proposed that a network's worst-case attack was a graph traversal problem [@phillips-swiler-1998-nspw]. Nodes encoded network states (the set of attacker privileges across the set of hosts). Edges encoded atomic attack steps (an exploit that, given prerequisite privileges, granted new ones). The shortest path from "attacker outside the network" to "attacker has goal asset" was the worst-case attack.

Phillips and Swiler observed in a now-canonical sentence that "the security of a network is more than the sum of the security of its hosts." It is the conceptual ancestor of every attack-path tool that followed.

Four years later, at IEEE Symposium on Security and Privacy 2002, Oleg Sheyner (then a CMU PhD student) along with Joshua Haines and Richard Lippmann (MIT Lincoln Lab), Somesh Jha (Wisconsin), and Jeannette Wing (CMU) made the construction automatic [@sheyner-et-al-2002-attack-graphs]. They encoded the network and attacker model as an NuSMV model-checker specification, treated the negation of the security goal as a temporal-logic property, and let the model checker generate every counterexample. The union of counterexamples was the attack graph.

They also proved that the minimum set of edges whose removal disconnects the attacker from the goal is NP-hard, but admits an O(log n) approximation -- the first asymptotic bound for the defender-hardening problem.

Sheyner's companion 2004 thesis -- Scenario Graphs and Attack Graphs, CMU-CS-04-122 -- remains the most readable book-length treatment of the academic-attack-graph generation [@sheyner-2004-thesis].

By 2005 the academic line had a scale solution. Xinming Ou, Sudhakar Govindavajhala, and Andrew Appel at Princeton released MulVAL at USENIX Security 2005 [@mulval-usenix-2005], encoding network state and attacker rules as Datalog facts and running them through XSB Prolog. From the paper's abstract: "Once the information is collected, the analysis can be performed in seconds for networks with thousands of machines." NetSPA at MIT Lincoln Lab (2006) and TVA / CAULDRON at George Mason (2003 to 2005) achieved similar scale through different mechanisms.

In parallel, on the Windows operator side, Sean Metcalf was running adsecurity.org and documenting AD misconfiguration patterns one writeup at a time [@adsecurity-org]. Microsoft was rolling out its Enhanced Security Administrative Environment ("Red Forest") tiered-administration model -- the predecessor that the Enterprise Access Model later replaced [@ms-enterprise-access-model] -- which was implicitly graph-aware (the entire ESAE prescription is a tier diagram) but never exposed the tier graph as a queryable structure. ESAE was a deployment blueprint, not a tool.

gantt title Three parallel tracks converging on the attack-path graph dateFormat YYYY axisFormat %Y

section Academic CVE line
Phillips and Swiler NSPW   :a1, 1998, 1999
Sheyner et al. IEEE SP     :a2, 2002, 2003
MulVAL USENIX              :a3, 2005, 2006
NetSPA TVA                 :a4, 2006, 2008

section Windows operator line
adsecurity.org writeups    :b1, 2014, 2016
PowerView                  :b2, 2014, 2016
Lambert essay              :milestone, b3, 2015-04-26, 1d
BloodHound DEF CON 24      :b4, 2016, 2018
AzureHound BloodHound 4.0  :b5, 2020, 2023
BloodHound CE 5.0          :b6, 2023, 2024
ADCS attack paths          :b7, 2024, 2025
Butterfly v6.3             :b8, 2024, 2026
OpenGraph v8.0             :b9, 2025, 2026

section Microsoft defender stack
ESAE Red Forest blueprint  :c1, 2015, 2018
Azure ATP LMPs preview     :c2, 2018, 2020
Defender CSPM attack paths :c3, 2022, 2024
MSEM GA                    :milestone, c4, 2024-11-19, 1d

The April 2015 essay sat between the two tracks. Lambert was inside Microsoft. He had read the academic literature. He worked next to the threat-intelligence teams who watched real intrusions unfold. The essay was the moment the two communities' vocabularies met. The diagnosis was right. The cure was sixteen months away. It would not come from Microsoft, and it would not come from academia. It would come from a red-team consultancy and a free graph database, and it would debut on a Saturday in August at a hacker convention in Las Vegas.

4. Defenders evaluating Active Directory as a list of ACLs

If you were a Microsoft-aligned AD security engineer in 2013, your job was to read ACLs. One object at a time. Down a list.

The mainstream defender toolchain of the era was, almost without exception, list-shaped. Microsoft shipped dsacls.exe and PowerShell's Get-Acl; both produced row-oriented output that an analyst read sequentially. The commercial AD-audit market -- NetWrix Auditor [@netwrix-auditor], ManageEngine ADAudit Plus [@manageengine-adaudit-plus], Quest ActiveRoles [@quest-activeroles], and others -- produced HTML reports with one section per directory object, one row per non-default Access Control Entry, and severity classifications based on a fixed checklist of "dangerous rights" (GenericAll, WriteDacl, WriteOwner, and a handful of others).

Microsoft's own Best Practices for Securing Active Directory document codified this approach. Its central recommendation was per-object delegation review: walk the directory tree, evaluate each object's ACL against a hardening checklist, and remediate non-default ACEs that exceed the documented privilege model [@ms-best-practices-ad]. The document is excellent at what it sets out to do. What it does not do -- because the format does not permit it -- is compose multi-hop reachability.

This was the failure mode Lambert described. A help-desk operator with ForceChangePassword on a junior service account appears as one row in the audit. The junior service account's MemberOf Server Operators membership appears in a different report section. The Server Operators group's logon rights on the Domain Controller appear in the Domain Controller's own report. Three findings in three places, with no machinery to compose them. The data model cannot represent the question.

A reader trained on Phillips and Swiler, Sheyner et al., MulVAL, NetSPA, and TVA might object that *the field already had* graph-based attack-path analysis a decade before BloodHound. True -- in academia. The academic line solved the scale problem (MulVAL: *"seconds for networks with thousands of machines"*) but spoke the wrong vocabulary. Its atomic attack step was a CVE exploit -- a buffer overflow, a format-string bug, a daemon remote code execution. The dominant AD attack primitive is not a CVE. A user with `WriteDacl` on a group is not exploiting any vulnerability; they are using the system as designed. None of MulVAL, NetSPA, or TVA developed an AD-style privilege-graph input format, and the operator community never adopted them. The academic line was substrate-mismatched and delivered as research-PDF tarballs rather than as `git clone`-able tools.

Two communities, both wrong in different ways. The academic one had the right algorithm with the wrong substrate. The operator one had the right substrate with no algorithm at all. The fix was to fuse them. That happened on August 6, 2016.

5. The breakthrough -- BloodHound and Six Degrees of Domain Admin

Saturday, August 6, 2016. 1:00 PM. DEF CON 24, Track 2, Paris and Bally's hotel-casinos in Las Vegas [@defcon-24-archive]. Three speakers from Veris Group's adaptive threat division step on stage: Andy Robbins, Rohan Vazarkar, Will Schroeder. The talk is titled Six Degrees of Domain Admin: Using Graph Theory to Accelerate Red Team Operations [@defcon24-bloodhound-slides]. The Veris Group team would spin out as SpecterOps the following year, but the August 2016 attribution -- Robbins (@_wald0), Vazarkar (@CptJesus), and Schroeder (@harmj0y) -- is the canonical one [@bloodhound-legacy-repo].

The talk demonstrated three design decisions that in retrospect look obvious and at the time were not.

First, model Active Directory as a directed property graph. Nodes are typed security principals (User, Computer, Group, Domain, OU, GPO). Edges are typed privilege relationships (MemberOf, AdminTo, HasSession, GenericAll, GenericWrite, WriteDacl, ForceChangePassword, and others). This was Lambert's framing made concrete: every ACE that grants a dangerous right becomes an edge from the trustee to the target.

Second, reuse Neo4j as the engine. Do not build a custom graph database; piggyback on Cypher's pattern-matching query language. The cost of building a path-finding engine from scratch was non-trivial; the cost of standing up Neo4j was a single Docker container.

Third, ship a collector that emits typed JSON edges, not raw NTSecurityDescriptors. The collector's value is the interpretation of an ACE as a graph edge -- the mapping from a binary security descriptor to the typed edge that says "trustee X has GenericWrite on object Y" is the hard part, and a defender re-creating that mapping per query would lose. SharpHound (initially SharpHound.ps1, then a C# binary) does the interpretation once at collection time and writes the edges to disk [@bloodhound-ce-repo].

SharpHound's enumeration calls are visibly descended from PowerView. The November 2020 SpecterOps blog announcing BloodHound 4.0 acknowledges the lineage explicitly, naming Schroeder's joint authorship of both projects and crediting PowerView as the data-collection precursor [@specterops-2020-bloodhound-4]. PowerView's August 2014 release was the substrate that made BloodHound's August 2016 synthesis possible. The chain is unbroken: enumeration in 2014, framing in April 2015, graph synthesis in August 2016.

The five-day-of-whiteboard-work figure comes from the SpecterOps team's own internal benchmark from the original DEF CON 24 talk. The 200 ms query latency is the typical 2024 figure on a mid-size enterprise forest of roughly 10^5 nodes and 10^6 edges, retained as the company's marketing framing across subsequent blog posts.

Key idea: BloodHound's breakthrough was neither an algorithm nor an architecture. It was the decision to interpret Active Directory's access-control data as a typed graph and ship that interpretation as a tool the operator community could actually run.

Crucially, the choice to use Neo4j's free shortestPath() function rather than building a custom path-finder was a delivery decision as much as a technical one. Neo4j already did breadth-first shortest paths. The team did not need to invent anything. The hard work was in the edge taxonomy and the collector, not in the graph database.

The talk made a promise the rest of the article must now cash: that there is a real algorithm under the hood, that the algorithm has a name, and that the name is not Dijkstra.

6. The algorithmic core -- property graphs, Cypher, and shortest paths

If you have never written a Cypher query in your life, the next ninety seconds is the entirety of the syntax you need.

A graph in which both nodes and edges are typed and can carry key-value properties. A node might have type `User` and properties `name='BOB@CORP.LOCAL'`, `enabled=true`, `pwdlastset=1719234234`. An edge might have type `GenericWrite` and properties `source='ACE'`, `isacl=true`. This is the data model Neo4j implements and the model BloodHound uses.

A Cypher pattern is a parenthesised node, a bracketed edge, and a parenthesised node, with arrows showing direction. (u:User)-[:MemberOf]->(g:Group) reads "find a node u of type User connected by a MemberOf edge to a node g of type Group." A full query has four parts: MATCH for the pattern, optional WHERE for filters, RETURN for the output. That's it.Cypher patterns visually mimic ASCII graph drawings: parentheses are nodes, square brackets are edges, and the arrow direction matches the edge direction. The syntax was deliberately designed to look like the diagram you would sketch on a whiteboard.

The pattern-matching query language originally created for the Neo4j graph database. Cypher's syntax is declarative: you describe the shape of the data you want, and the engine plans the traversal. Since April 2024 Cypher has been the basis of the ISO/IEC 39075:2024 GQL standard -- the first ISO-standardised graph query language [@iso-39075-2024-gql] [@opencypher-home].

Here is the canonical BloodHound query, with annotations:

MATCH p = shortestPath(
  (u:User {name:'BOB@CORP.LOCAL'})       // start node: Bob
    -[*1..]->                            // any number of typed edges
  (g:Group {name:'DOMAIN ADMINS@CORP.LOCAL'})  // end node: Domain Admins
)
RETURN p

The shortestPath() wrapper tells Neo4j to short-circuit at the first solution. The variable-length quantifier [*1..] says "one or more edges of any type." The p = binds the entire matched path to the variable p, which the RETURN then emits as a sequence of node-edge-node triples that the BloodHound frontend renders as an SVG.

What does shortestPath() actually run? Here is where the misconception that BloodHound uses Dijkstra needs to die. The current Neo4j Cypher manual is explicit that shortestPath() runs an unweighted bidirectional traversal -- BFS in the classical sense -- between the source and target nodes [@neo4j-cypher-shortest-paths]. Not Dijkstra. Not A*. BFS.

A graph-traversal algorithm that explores all nodes at the current depth before proceeding to the next depth. Starting from the source, it visits all 1-hop neighbours, then all 2-hop neighbours, and so on. For unweighted graphs, BFS is guaranteed to find a shortest path (in number of edges) the first time it reaches the target. Worst-case time is O(V + E) where V is the node count and E is the edge count -- the trivial information-theoretic lower bound for any algorithm that must read the input.

Cypher does support weighted shortest paths via the Neo4j Graph Data Science library, but the BloodHound CE distribution does not enable it. There is no natural cost metric on Active Directory privilege edges in 2026; every edge is treated as one hop.

Why BFS rather than Dijkstra? Dijkstra is BFS's generalisation to weighted graphs. If your edges have natural costs -- road distances, link latencies, dollar prices -- Dijkstra (with a Fibonacci heap, $O(E + V\log V)$) gives you shortest paths under that cost metric. Active Directory privilege edges do not have a natural cost metric. MemberOf, GenericAll, and CanRDP are all "the attacker can take this step." Some are easier than others, but quantifying how much easier is itself an unsolved problem (see Section 11). Treating every edge as one hop is the load-bearing simplification that makes the model tractable.

Of a binary relation R: the relation R+ that contains the pair (a, c) whenever there is a chain a R b R ... R c. For a graph, the transitive closure tells you, for every pair of nodes, whether one is reachable from the other. BloodHound's `shortestPath()` queries can be thought of as on-demand evaluation of the transitive closure restricted to one source-target pair.

The per-query complexity is $O(V + E)$, the standard BFS bound from any algorithms textbook. On a mid-size enterprise forest -- roughly 10^5 nodes and 10^6 edges -- a single user-to-group shortest path returns in sub-second wall-clock time.

The variable-length quantifier [*1..N] for general path enumeration is a different matter. Cyclic graphs admit exponentially many paths in N, and Neo4j's documentation explicitly warns that quantified path patterns can return exponentially many results in the worst case [@neo4j-cypher-variable-length]. The shortestPath() short-circuit avoids this by returning on the first hit; allShortestPaths() enumerates only paths tied for shortest; unbounded enumeration is intractable on any non-trivial graph.

A concrete demonstration is in order. The runnable snippet below is a 35-line implementation of unweighted BFS over a six-node toy graph. It returns the shortest path from a helpdesk user to the domain-admin group. This is, structurally, the same algorithm Neo4j's shortestPath() runs. The numerical answer ("path of length 4") is the same number BloodHound would report on the same graph.

{` // Edges modelled as a typed adjacency list. const edges = { helpdesk: [{ to: 'carol', via: 'ForceChangePassword' }], carol: [{ to: 'serverops', via: 'MemberOf' }], serverops: [{ to: 'dc01', via: 'CanRDP' }], dc01: [{ to: 'krbtgt', via: 'DCSync' }], krbtgt: [{ to: 'domain-admin', via: 'GoldenTicket' }], domain_admin: [] };

function shortestPath(start, goal) { const queue = [[start]]; // queue holds candidate paths const seen = new Set([start]); // each node enqueued at most once

while (queue.length) { const path = queue.shift(); // BFS = FIFO const node = path[path.length - 1]; if (node === goal) return path; for (const e of (edges[node] || [])) { if (!seen.has(e.to)) { seen.add(e.to); queue.push([...path, e.to]); } } } return null; // unreachable }

const path = shortestPath('helpdesk', 'domain-admin'); console.log('Path:', path.join(' -> ')); console.log('Length:', path.length - 1, 'hops'); `}

The algorithm fits on a single screen. The hard work of BloodHound is not in this loop. The hard work is in deciding which edges to insert into the graph in the first place. That decision -- the edge taxonomy -- is what makes BloodHound a security tool rather than a graph-database demo.

sequenceDiagram participant Client as Cypher client participant Planner as Cypher planner participant Engine as BFS engine participant Graph as Property graph Client->>Planner: MATCH shortestPath(u to g) Planner->>Engine: plan(start=u, end=g, bidirectional=true) Engine->>Graph: expand 1-hop frontier from u Engine->>Graph: expand 1-hop frontier from g Graph-->>Engine: neighbours of u, neighbours of g Engine->>Graph: expand 2-hop frontier (both sides) Engine->>Graph: expand 3-hop frontier (both sides) Graph-->>Engine: frontiers intersect at node m Engine-->>Planner: reconstruct path u to m to g Planner-->>Client: return path

7. The edge taxonomy -- what Active Directory actually looks like as a graph

The BloodHound graph is not every privilege relationship in Active Directory. It is the set of relationships that the SpecterOps team has decided -- by iterative discovery, often in response to specific community-reported abuse primitives -- to model. The taxonomy has grown roughly monotonically since 2016; the rate has accelerated since the BloodHound CE 5.0 reboot in August 2023 [@specterops-2023-bloodhound-ce].

Eight families dominate the 2026 graph. They map, family by family, onto the substrates of a modern hybrid enterprise.

1. Group membership. MemberOf is the simplest edge. Cypher's variable-length quantifier ([:MemberOf*1..]) walks transitive memberships in one expression, which is why nested-group reachability is a one-liner.

2. ACL write-equivalents. GenericAll, GenericWrite, WriteDacl, WriteOwner, Owns, AllExtendedRights, ForceChangePassword, AddSelf, AddMember, and AddKeyCredentialLink (the shadow-credentials primitive). Each names a specific dangerous-right pattern on a directory object's security descriptor. SharpHound's interpreter scans the nTSecurityDescriptor attribute and emits one typed edge per matching ACE.

3. Sessions. HasSession is the dynamic edge that goes stale fastest. SharpHound enumerates active sessions via NetSessionEnum and SAMR; the resulting edges describe "user U is currently logged into computer C." The graph is whatever the most recent collection captured.

4. Remote execution rights. AdminTo, CanRDP, ExecuteDCOM, CanPSRemote, SQLAdmin. Each describes a code-execution primitive granted to a principal on a computer object.

5. Kerberos delegation. AllowedToDelegate (constrained delegation), AllowedToAct (resource-based constrained delegation, via msDS-AllowedToActOnBehalfOfOtherIdentity), unconstrained delegation surfaced as node properties on Computer objects, and -- from BloodHound CE v6.3 in December 2024 [@bloodhound-v6-3-release] -- a CoerceToTGT edge that replaces the older UnconstrainedDelegation finding for BHE customers.

6. ADCS edges (early-access January 2024). ADCSESC1 through ADCSESC10, plus the CoerceAndRelayNTLMToADCS edge for ESC8 [@specterops-2024-adcs-bloodhound]. These edges land in BloodHound roughly thirty months after Will Schroeder and Lee Christensen first published the ESC1 to ESC8 catalog in Certified Pre-Owned: Abusing Active Directory Certificate Services [@specterops-2021-certified-preowned]. Each ADCS edge in BloodHound is the most complex in the taxonomy because each is composed from multiple raw facts (see the traversable / non-traversable discussion below).

7. Azure / Entra ID edges (via AzureHound, November 20, 2020) [@specterops-2020-bloodhound-4]. AZGlobalAdmin, AZRoleAssignment, AZContains, AZOwns, AZUserAccessAdministrator, AZAddSecret, AZMGAddOwner, plus AzureRM-side resource roles. Microsoft Entra Privileged Identity Management (PIM) role coverage was added in BloodHound v8.0 in July 2025 [@specterops-2025-opengraph].

8. OpenGraph custom edges (v8.0, July 29, 2025). User-defined edges for arbitrary substrates: GitHub, Snowflake, Microsoft SQL Server, ServiceNow, Tailscale, Duo. The schema is intentionally generic so that a community contributor can ship edges for any system whose privilege model can be drawn as a graph [@bloodhound-opengraph-library].

Family	Representative edges	Underlying AD mechanism	What it gives the attacker
Group membership	`MemberOf`	`member` attribute on group object	Inherits all permissions held by the group
ACL write-equivalents	`GenericAll`, `GenericWrite`, `WriteDacl`, `WriteOwner`, `ForceChangePassword`, `AddKeyCredentialLink`	Specific dangerous-right ACE patterns in `nTSecurityDescriptor`	Take control of the target principal (reset password, modify object, plant shadow credentials)
Sessions	`HasSession`	`NetSessionEnum` and `SAMR` enumeration on member computers	Pivot via credential theft from the logged-in user's memory
Remote execution	`AdminTo`, `CanRDP`, `ExecuteDCOM`, `CanPSRemote`, `SQLAdmin`	Local-admin membership, RDP / DCOM / WinRM / SQL group rights	Run arbitrary code as the target principal on the target host
Kerberos delegation	`AllowedToDelegate`, `AllowedToAct`, `CoerceToTGT`	Constrained and resource-based delegation attributes	Forge service tickets and impersonate other accounts
ADCS composite	`ADCSESC1` through `ADCSESC10`, `CoerceAndRelayNTLMToADCS`	Certificate template misconfigurations plus CA trust plus enrollment ACEs	Obtain a certificate usable for authentication as a privileged account
Azure / Entra	`AZGlobalAdmin`, `AZRoleAssignment`, `AZAddSecret`, `AZOwns`, `AZMGAddOwner`	Entra role assignments, AzureRM RBAC	Cross the on-prem to cloud boundary; pivot via tenant or subscription privileges
OpenGraph	User-defined	Any substrate the contributor models	Anything the contributed schema encodes

The ADCS family deserves a closer look because it introduced an important new modelling vocabulary.

A *traversable* edge is one the shortest-path query can step through directly: `MemberOf`, `ForceChangePassword`, `CanRDP`. A *non-traversable* edge is a precondition relationship that is only exploitable when several others appear together. A certificate template's `Enroll` ACE is non-traversable on its own; combined with eight other facts about the template, the issuing CA, and the domain's trust posture, it composes into `ADCSESC1`. The post-processor scans for the full pattern and synthesises a single traversable edge that the BFS can then treat as one hop [@specterops-2024-adcs-bloodhound].

For ESC1 the pattern has nine numbered prerequisites: six template and CA requirements, two enterprise-CA trust facts, and one implicit constraint. None of the nine raw facts is exploitable in isolation. All nine together are. The post-processor's job is to walk the candidate sub-graphs, check every requirement, and write the composed ADCSESC1 edge when the pattern holds.

This is a non-trivial graph-modelling contribution because it gives the field a vocabulary for "an edge that is real only as a join over several facts." It also generalises beyond ADCS: any future attack primitive composed from a fixed pattern of raw facts can be modelled the same way.

ESC8 -- the NTLM relay primitive against an HTTP-enrollment certificate authority -- is the most delicate case, and the one most commonly mis-modelled in early secondary writeups.

Note: The CoerceAndRelayNTLMToADCS edge is a Group(Authenticated Users) to Computer(coerced target) edge, not a Computer to Computer edge as some early secondary writeups described. The relay-target CA and the certificate template are carried as edge metadata, not as additional graph nodes. The canonical edge documentation is explicit: Source: Authenticated Users [Group] / Destination: Computer / Traversable: Yes [@bloodhound-coerce-relay-edge].

The schema correction matters because the source-principal choice affects every shortest-path query that crosses ESC8. If the edge is mis-modelled as Computer to Computer, queries that begin from a low-privilege user account miss the path entirely. The Group-to-Computer schema correctly captures that any authenticated principal can coerce.

What the graph does not yet model is also worth naming. The SpecterOps/TierZeroTable README states it verbatim: "DISCLAIMER: The table does not include all Tier Zero assets yet." [@specterops-tier-zero-table] Several edge classes remain partially or fully out of scope; the full enumeration appears in Section 11 (open problems). Coverage expansion is iterative and community-fed; OpenGraph (Section 9) is the structural answer to "where does the graph end?"

flowchart TD subgraph OnPrem["On-prem Active Directory"] Members["MemberOf, Owns"] ACL["ACL write-equivalents
GenericAll, WriteDacl, ForceChangePassword,
AddKeyCredentialLink"] Sessions["HasSession"] Exec["AdminTo, CanRDP, ExecuteDCOM,
CanPSRemote, SQLAdmin"] Krb["Kerberos delegation
AllowedToDelegate, AllowedToAct, CoerceToTGT"] end subgraph Entra["Entra ID and AzureRM"] EntraRoles["AZGlobalAdmin, AZRoleAssignment,
AZAddSecret, AZUserAccessAdministrator,
PIM roles"] AzureRM["AZContains, AZOwns,
AzureRM resource roles"] end subgraph ADCS["ADCS composite edges"] ESC["ADCSESC1 to ADCSESC10,
CoerceAndRelayNTLMToADCS"] end subgraph Open["OpenGraph user-defined"] Custom["GitHub, Snowflake, SQL Server,
ServiceNow, Tailscale, Duo, custom"] end OnPrem --> Cypher((Cypher query layer)) Entra --> Cypher ADCS --> Cypher Open --> Cypher

If this is what one community modelled, the natural question is: what did Microsoft model? And when?

8. The defender adoption -- Microsoft catches up, 2018 to 2024

The defender vendor whose product BloodHound was mapping is also a defender vendor with a graph product of its own. Three of them, in fact, shipped in three different years for three different substrates. They are easy to confuse; press releases sometimes do.

The first arrived on November 27, 2018, when Tali Ash (then a Program Manager on the Azure Advanced Threat Protection team) announced a preview feature called Lateral Movement Paths (LMPs) in a Microsoft tech-community post [@ms-azure-atp-lmp-2018]. LMPs were a graph-shaped visualisation, but a constrained one: restricted to "sensitive accounts" (a configurable set defaulting to Domain Admins and similar) plus non-sensitive accounts that had shared a session on the same host as a sensitive account.

The portal rendered one- and two-hop credential-theft pivots as a static SVG. There was no Cypher equivalent, no LMP-export API, and no way to write a custom query. Azure ATP was rebranded Microsoft Defender for Identity in 2020, and the LMP feature came along under the new name [@ms-mdi-lmp-docs].

Several secondary sources date Lateral Movement Paths to "June 2019," which corresponds to the general-availability and rebrand window rather than the original preview announcement. The primary Microsoft tech-community post is November 27, 2018; treat the year-not-month for any third-party claim and prefer the November 2018 preview date as the canonical first ship.

The second arrived in October 2022, when Microsoft Defender for Cloud's Defender CSPM plan added a cloud security graph with attack-path analysis (public preview at Ignite October 2022; generally available March 28, 2023) [@ms-defender-cloud-attack-path]. This product is a cloud attack-path graph: Azure plus AWS plus GCP asset inventory, with inferred edges for permissions, network reachability, vulnerability presence, and internet exposure. It is explicitly not the Active Directory identity graph; it covers the multi-cloud workload surface.

A common mistake conflates this 2022-2023 product (Defender for Cloud) with Microsoft Defender for Identity (the LMP product from November 2018). The substrate, the team, and the year are all different. Worth flagging here because secondary writeups repeat the confusion often.

The Microsoft Defender XDR product family is a naming minefield. *Defender for Identity* (MDI) is the on-prem AD identity-threat product; LMPs are its graph view. *Defender for Cloud* (MDC) is the multi-cloud workload-protection product; its CSPM plan ships cloud-security-graph attack-path analysis. *Defender for Endpoint* (MDE) is the EDR product; it does not ship its own attack-path graph but feeds telemetry into MSEM. *Microsoft Security Exposure Management* (MSEM, GA November 19, 2024) is the unified exposure-graph layer that subsumes the others. Four products, four substrates, four ship dates. The naming overlap is unfortunate but the distinctions are real.

The third arrived on November 19, 2024, at the Ignite 2024 keynote in Chicago. Satya Nadella, in the opening keynote, announced that Microsoft Security Exposure Management (MSEM) had reached general availability [@ms-ignite-2024-msem]. MSEM is the product whose attack-path model is structurally equivalent to BloodHound's: cross-substrate (identity plus endpoint plus multi-cloud), first-class attack-path objects with choke-point and blast-radius dashboards, and continuous data feed via the Defender XDR signal plane.

Microsoft's unified exposure-graph product, generally available November 19, 2024, at Ignite 2024 in Chicago. MSEM ingests telemetry from Defender for Endpoint, Defender for Identity, Defender for Cloud, Entra ID, and the Defender XDR plane into a single graph. Attack paths are first-class objects with three dashboard views: an attack-path list, choke-point analysis (small sets of nodes whose compromise enables disproportionately many downstream paths), and blast-radius (downstream reach of a selected node) [@ms-msem-attack-paths].

The MSEM docs page introduces the model verbatim: "Attack paths in Microsoft Security Exposure Management help you to proactively identify and visualize potential routes that attackers can exploit using vulnerabilities, gaps, and misconfigurations across endpoints, cloud environments, and hybrid infrastructures." And, on choke points: "By focusing on these choke points, you can reduce risk by addressing high-impact assets."

The query interface is the Defender XDR portal plus KQL (Kusto Query Language), not Cypher. The graph engine is proprietary; Microsoft does not publish per-query latency numbers or the underlying algorithms. But the model -- nodes, typed edges, attack paths as the unit of analysis, choke-point and blast-radius views -- is the model BloodHound shipped at DEF CON 24 in August 2016.

The arc takes eight years. From the August 6, 2016 BloodHound talk to the November 19, 2024 MSEM general-availability announcement is eight years and three months. The defender vendor whose product the original BloodHound was mapping ships a defender product whose attack-path model is structurally equivalent to the one a red-team consultancy shipped in a conference talk eight years earlier.

Microsoft adopted the model. The community kept extending it. By 2026 the frontier is no longer "does the graph exist?" It is "how do we make the graph weighted, complete, and substrate-independent?" That is the state of the art.

9. State of the art -- Tier Zero, ADCS edges, and OpenGraph, 2023 to 2026

By the time MSEM shipped, SpecterOps had already moved past "is there a graph?" and was asking three sharper questions. Where does the graph end? How do we model attack primitives that compose from raw facts? And does the AD-specific schema even matter?

The first question is what "Tier Zero" means. On June 22, 2023, Jonas Bülow Knudsen, Elad Shamir, and Justin Kohler at SpecterOps published What is Tier Zero -- Part 1, which reframed Microsoft's tiered-administration concept -- introduced in the 2012-2014 Securing Privileged Access guidance and renamed the Enterprise Access Model with the "Control Plane" vocabulary in December 2020 [@ms-enterprise-access-model] -- as a property of the graph [@specterops-2023-tier-zero].

A Tier Zero asset (see Definition below) reframes Microsoft's tiered concept from the set of things in the high-privilege tier to the set of things from which the high-privilege tier is reachable. Microsoft's own Tier 0 definition -- "Direct Control of enterprise identities... and all the assets in it" -- becomes a graph property. The two formulations are equivalent if and only if the graph is complete. If the graph is incomplete (which it is), the Tier Zero set computed from the graph is the floor, not the ceiling.

Any node in the attack-path graph whose compromise lets an attacker reach an administrative privilege in the forest. The companion `SpecterOps/TierZeroTable` GitHub project is the community-maintained inventory; the README discloses that the table is the floor, not the ceiling [@specterops-tier-zero-table].

The Tier Zero definition is the answer to "shortest path to what?" -- the target side of every BloodHound shortest-path query. Without a defined Tier Zero set, the question has no endpoint.

The second question is how to model attack primitives that compose. The January 24, 2024 SpecterOps blog by Knudsen formalised this with the traversable / non-traversable edge distinction discussed in Section 7. The mechanism generalises: any attack primitive whose exploitability is a conjunction of raw facts can be encoded as a composed edge that the post-processor synthesises when the pattern is present.

ESC1, walked through in Section 7, is the canonical example: nine numbered prerequisites that the post-processor checks before writing the composed ADCSESC1 edge [@specterops-2024-adcs-bloodhound]. Subsequent posts in the series extended the same machinery to ESC3, ESC4, ESC6, ESC7, ESC8 (the CoerceAndRelayNTLMToADCS edge), ESC9, ESC10, and -- on February 14, 2024 -- ESC13 [@specterops-2024-esc13].

In December 2024 BloodHound v6.3 introduced an early-access "improved analysis algorithm" internally referred to as Butterfly [@bloodhound-v6-3-release]. Butterfly is the first production attempt at bi-directional impact analysis. Pre-v6.3 BloodHound Enterprise quantified risk as "who can reach this node?" (incoming attack-path count). v6.3 also quantifies "who can this node reach if compromised?" (outgoing blast radius).

The release notes describe the outcome but not the algorithm: "Improve risk scoring fidelity for all finding types... Measure risk at each individual finding... Support the inclusion of hybrid paths in risk scoring (Azure assets will now contribute to measured risk in AD and vice versa)."

The same release also announced that BloodHound Enterprise had begun migrating off Neo4j onto PostgreSQL as the graph database, with the release notes reporting ">50% improvement in the time it takes to perform post-processing during the Analysis process." Cypher continues to be the query language; the engine underneath changed.

The third question -- whether the AD-specific schema matters -- got its answer on July 29, 2025, when SpecterOps released BloodHound v8.0 with OpenGraph [@specterops-2025-opengraph]. OpenGraph decouples the graph engine from the AD-specific schema. Users (and SpecterOps partners) define their own node and edge kinds and ingest attack-path data from arbitrary substrates. The initial release included GitHub organisations, Snowflake role hierarchies, Microsoft SQL Server logins, ServiceNow groups, and Tailscale ACLs. Subsequent community contributions extended the library.

BloodHound OpenGraph is a foundational shift toward... identity risk management across the entire enterprise. -- Justin Kohler, SpecterOps Chief Product Officer, July 29, 2025.

OpenGraph is the closing observation of an arc that began with Phillips and Swiler in 1998: the model is the abstraction; the substrate is whatever your enterprise runs. The same shortestPath() that finds Active Directory attack paths now finds attack paths over a GitHub organisation, a Snowflake role hierarchy, or a Microsoft SQL Server login graph, with no engine change. The 2026 BloodHound release (v9.1.0, May 6, 2026, per the public release-notes index [@bloodhound-release-notes-index]) extends OpenGraph and adds incremental edge updates -- the first step toward a streaming graph rather than a snapshot.

10. Competing approaches -- BloodHound versus Microsoft versus the alternatives

In 2026 no single product covers every substrate. The field is plural.

A practitioner choosing among attack-path tools answers four questions, in order. What substrate do you need to cover? Self-host or SaaS? Snapshot or continuous? Open query language or vendor portal? The table below assembles the answers on the dimensions a 2026 practitioner actually uses.

Tool	Substrate	Engine	Query language	Deployment	Licensing	Edge weighting	ADCS coverage	Best fit
BloodHound CE 9.x	AD + Entra + AzureRM + OpenGraph	Neo4j + Postgres app DB	Cypher	Self-hosted Docker Compose	Apache-2.0	No (unweighted BFS)	Yes -- ESC1 through ESC10 + ESC8 composite	Authorised offensive testing + DIY blue team
BloodHound Enterprise	Same as CE	PostgreSQL-as-graph (in-progress migration off Neo4j)	Cypher	SaaS	Commercial	Bi-directional (Butterfly v6.3+); weighting function not public	Yes	Continuous AD/Entra attack-surface management at enterprise scale
Adalanche	AD (on-prem; LDIF or live LDAP)	In-memory Go	AQL (GQL-like)	Single Go binary	Open source	No	Yes (per README)	Offline / air-gapped analysis from LDIF
Microsoft Security Exposure Management	Defender XDR signal: identity + endpoint + multi-cloud + Entra	Proprietary	KQL + portal	SaaS	Microsoft licensing	Implicit (filter against exploitability oracle)	Indirect via MDI signals	Hybrid Microsoft-substrate unified exposure
MDI Lateral Movement Paths	On-prem AD (sensitive-account paths only)	Proprietary	None -- portal only	SaaS	Microsoft licensing	n/a	Implicit via separate MDI alerts	Default-on credential-hopping detection
Defender for Cloud CSPM attack-path analysis	Multi-cloud (Azure + AWS + GCP)	Proprietary	Cloud Security Explorer + KQL	SaaS	Microsoft licensing	Implicit	n/a	Multi-cloud workload protection
PingCastle / Semperis DSP / ADAudit Plus	On-prem AD (+ limited Entra)	None -- list-of-findings	None -- HTML / portal	Self-hosted or SaaS	Commercial / mixed	n/a	Single-finding hygiene only	Compliance auditing and change tracking

A few rows deserve commentary.

Note: BloodHound CE ships under Apache-2.0 per the current repository [@bloodhound-ce-repo]. The GPL-3.0 license you may see in older treatments applies only to the deprecated BloodHound Legacy v4 repository [@bloodhound-legacy-repo], which was last updated in 2023 and is no longer maintained. The licensing difference is material: GPL-3.0 is copyleft, Apache-2.0 is permissive. Downstream use cases that need permissive licensing should rely on the current CE.

Older blog posts and conference talks frequently call BloodHound CE GPL-3.0. The CE-Legacy LICENSE block does carry the GPL-3.0 copyright header, which is the source of the confusion. The current CE codebase at github.com/SpecterOps/BloodHound is Apache-2.0; the GPL-3.0 LICENSE applies only to the deprecated Legacy v4 repository.

Adalanche, by Lars Karlslund, is the load-bearing counter-example to the claim that "the graph model requires Neo4j" [@adalanche-repo]. Adalanche reads AD data from an LDIF dump or live LDAP, builds the graph entirely in process memory, and exposes a web GUI plus an Adalanche Query Language (AQL) -- described in the README as "a GQL-like language that allows for complex queries."

The README's headline claim is verbatim: "Adalanche gives instant results, showing you what permissions users and groups have in an Active Directory." The trade is no continuous monitoring, no multi-user web app, and a smaller community in exchange for zero deployment friction. The model is identical; the engine is replaceable.

MSEM is the closest Microsoft analogue to BloodHound (see Section 8 for substrate and query interface). Reasonable defenders run both MSEM and BloodHound (CE or Enterprise) on the same forest. The tools are complementary rather than substitutionary: MSEM brings the EDR plus workload-protection telemetry that BloodHound does not natively ingest, while BloodHound brings the precise AD edge semantics that SpecterOps's research community has validated. Running both is not double-counting.

The hygiene scanners -- PingCastle [@pingcastle], Semperis DSP [@semperis-dsp], ManageEngine ADAudit Plus [@manageengine-adaudit-plus] -- are the surviving descendants of the per-object ACL-inspection generation, with risk-scoring layered on top. They are valuable for compliance auditing and change tracking. They do not expose a queryable attack-path graph. The compliance auditor and the attack-path analyst are different personas with different tools.

If the field is plural and every tool has a gap, what is the shape of the problem that no tool yet solves? The next section is the honest answer.

11. Theoretical limits and open problems

Some of the gaps in attack-path analysis are engineering gaps. Others are not.

The single most consequential open problem is edge weighting. BloodHound's BFS treats every edge as one hop. In reality, MemberOf is effectively free; ForceChangePassword requires the attacker to log in as the changed principal afterwards; AddKeyCredentialLink requires shadow-credential infrastructure; CoerceAndRelayNTLMToADCS requires an active SMB-coercion primitive, NTLM relay tooling, and an ESC8-vulnerable certificate authority. A shortest-hop path is not in general the shortest-exploitation-cost path.

BloodHound Enterprise v6.3 shipped the Butterfly analysis as the first production attempt to relax this assumption. As the v6.3 release notes acknowledge (see Section 9), Butterfly's weighting function is not publicly documented [@bloodhound-v6-3-release].

Academic intuition suggests weighting edges by an exploitation-success probability and computing most-likely-exploited paths via shortest paths under negative-log-probability weights. The complexity ceiling is well-known: Dijkstra (with non-negative weights) runs in $O(E + V\log V)$ time with a Fibonacci heap; Bellman-Ford handles negative weights at $O(VE)$. Either fits comfortably inside the per-query budget BloodHound already operates within.

The unsolved part is the empirical calibration: what numerical weight is the right weight on a ForceChangePassword edge versus a CoerceAndRelayNTLMToADCS edge? There is no published peer-reviewed answer.

The second open problem is coverage. The TierZeroTable README is the authoritative self-disclosure: file-system ACLs on member servers, fine-grained GPO delegation, on-host service-account permissions, some Entra conditional-access logic, and cross-tenant Entra B2B trust paths remain partially or fully out of scope [@specterops-tier-zero-table]. This is an engineering problem -- more collectors, more edge definitions -- rather than an algorithmic one. OpenGraph is the structural answer: shift coverage from "what edges has SpecterOps modelled?" to "what edges has the community contributed to the shared library?" [@bloodhound-opengraph-library].

The third is graph privacy. A continuously-collected, complete AD privilege graph shipped to a third-party SaaS backend is, in adversarial hands, a pre-computed attack plan for the customer's forest. Tenant isolation, encryption at rest, SOC 2 and FedRAMP attestation, and customer-managed key encryption do not eliminate the structural risk: a compromised SaaS backend yields the customer's graph regardless of compliance posture.

Cryptographic approaches -- homomorphic graph queries, secure multi-party computation for path enumeration -- exist in the theoretical literature but are not in production attack-path products at time of writing. Adalanche and self-hosted BloodHound CE remain the privacy-preserving options at the cost of forgoing continuous monitoring.

The fourth is the the graph is alive problem. Session edges (HasSession) go stale in hours. New ACEs, new group memberships, new sessions appear continuously. SharpHound's snapshot model is yesterday's view; continuous collectors (BloodHound Enterprise, MSEM agent streams) trade stealth for freshness. The May 6, 2026 release notes describe BloodHound CE's first move toward a streaming model: "incremental edge updates that reduce unnecessary writes during post-processing" [@bloodhound-release-notes-index]. No production attack-path tool yet ships a fully streaming graph.

The fifth is combinatorial intractability. These are not engineering gaps; they are complexity-theoretic facts.

Counting all attack paths is #P-complete. Leslie Valiant's 1979 result on the complexity of counting solutions to combinatorial problems applies directly: counting the simple paths between two nodes in a general graph cannot be done in polynomial time unless P = #P [@valiant-1979-permanent]. BloodHound's path-count UI is necessarily an approximation or a length-truncation; this is the theoretical reason why.
The minimum-edge-cut "defender hardening" problem is NP-hard. Choose the smallest set of edges whose removal disconnects the attacker from the goal. Sheyner et al. 2002 proved the result is NP-hard but admits an $O(\log n)$ approximation [@sheyner-et-al-2002-attack-graphs]. BHE choke-point ranking and MSEM choke-point analysis necessarily implement heuristic approximations.
Finding regular simple paths is NP-complete. Mendelzon and Wood 1995 proved that finding a simple path matching a regular expression over edge labels in a graph database is NP-complete [@mendelzon-wood-1995-dblp]. Cypher's shortestPath() does not enforce simple-path semantics, which is why it remains in P; quantified path patterns with DIFFERENT RELATIONSHIPS semantics (available since Neo4j 5.x and documented in the current Cypher manual [@neo4j-cypher-variable-length]) do enforce simple paths and so cross into the NP-complete regime.

Key idea: BloodHound's per-query algorithm (BFS, $O(V + E)$) is optimal up to constants. The frontier of the field is no longer the algorithm. It is what question we ask the algorithm: weighted? regular-path? simple-path? bi-directional? cross-substrate? Each open question is a different model, not a different implementation.

Three further honest framings deserve a mention.

First, the standardisation of OpenGraph edge taxonomies is unsettled. Without community convergence on edge naming, different contributors may model the same substrate with incompatible schemas. Historical precedent (MITRE ATT&CK technique IDs, CVE identifiers) suggests that convergence happens when a single high-trust curator becomes the de-facto registry; whether SpecterOps will operate OpenGraph as a vendor-neutral standards body or as a SpecterOps-owned artefact is a governance question, not an algorithmic one.

Second, the adversarial robustness of the collector is an open question: SharpHound runs as an authenticated principal, and an attacker with prior compromise can poison the collection. There is no closed-form defence.

Third, the absence of any public head-to-head benchmark of BloodHound CE versus BHE versus Adalanche versus MSEM on the same forest under controlled conditions is structural: Microsoft does not publish per-query latency, SpecterOps publishes only relative improvement claims, and the academic line uses 2005-era hardware figures that are not comparable.

12. Practical guide -- running BloodHound today

If the previous sections sold you on the model, the next few paragraphs are the minimum you need to stand it up.

Stand up BloodHound CE with the Docker Compose file in the repository [@bloodhound-ce-repo]. The stack is four containers: a PostgreSQL application database (users, roles, sessions, audit logs, saved queries); a Neo4j graph database holding the property graph; a Go REST API; and a React plus Sigma.js single-page frontend. Five minutes to first boot on a developer laptop. The repository README is the authoritative deployment reference.

Run SharpHound on a domain-joined Windows host as the collection identity. The default invocation -- SharpHound.exe --CollectionMethods all,GPOLocalGroup -- enumerates every group membership, every recognised ACL pattern, every active session, every local-admin relationship, and every Kerberos delegation. Run AzureHound with appropriate Entra ID credentials for Entra and AzureRM coverage. Both emit JSON dumps in the same envelope; the BloodHound CE upload tab in the web UI ingests both.

Open the web UI. The stock pre-built queries are a reasonable starting palette: Find Shortest Paths to Domain Admins, Find Principals with DCSync Rights, Find Computers with Unsupported Operating Systems, and Shortest Paths from Owned Principals to High-Value Targets (after marking some accounts as owned). Custom Cypher goes in the Cypher tab at the top right; the Section 6 query is a good template.

The most important interpretation discipline: treat the result as a risk register, not a vulnerability list. A finding is "Bob can reach Domain Admins via a four-hop path." The edge is "Bob has GenericWrite on Carol." Closing the edge breaks the finding and every other path that passed through it. Edges are the unit of remediation, not findings. SpecterOps's own 2021 customer-anonymous essay Active Directory Attack Paths -- Is It Always This Bad? reports findings from hundreds of engagements, and the recurring observation across forests is that a small number of high-blast-radius edges explain most of the discovered paths [@specterops-2021-ad-attack-paths].

A few pitfalls are worth naming. HasSession collection generates measurable LDAP and SAMR traffic that Microsoft Defender for Identity alerts on; coordinate with the blue team or expect detections. The stealth collection mode trades coverage for traffic volume.

Unconstrained variable-length Cypher queries (MATCH p=(a)-[*]->(b)) can pin Neo4j's heap; CE's "protected Cypher" cost limits help but do not eliminate the problem, so prefer shortestPath() or bound the path length explicitly. The wildcard-principal post-processing for Authenticated Users and Everyone requires v6.0 or later to be correct; older versions miscount these edges [@bloodhound-v6-0-release]. And the CoerceAndRelayNTLMToADCS edge is Group to Computer, not Computer to Computer, as discussed in Section 7.

```cypher MATCH (n) WHERE n.system_tags CONTAINS 'admin_tier_0' WITH collect(n) AS tier_zero MATCH p = shortestPath((u:User {enabled:true})-[*1..6]->(t)) WHERE t IN tier_zero AND NOT u.system_tags CONTAINS 'admin_tier_0' RETURN u.name AS source, t.name AS target, length(p) AS hops ORDER BY hops ASC LIMIT 25 ```

This returns up to 25 enabled non-Tier-Zero users with the shortest paths into Tier Zero. The [*1..6] bound prevents the pathological cyclic-graph cost explosion. Bound length aggressively until you have indexed your graph.

BloodHound is dual-use. Authorised defensive use on your own forest, or contracted penetration testing within written scope, is the standard legal posture. Running it against a directory you are not authorised to assess is unlawful in most jurisdictions: the Computer Fraud and Abuse Act in the United States; the Computer Misuse Act 1990 in the United Kingdom; equivalents in most EU jurisdictions. The dual-use posture is fundamental to the tool; the legal posture depends on you.

The tool is the easy part. The hard part is what you do with the answer.

13. Frequently asked questions

The misconceptions worth disposing of, in order of how often they recur.

No. BloodHound is the SpecterOps-maintained, evolving set of edge families. File-system ACLs on member servers, fine-grained GPO delegation, on-host service-account permissions, and some Entra conditional-access logic remain partially or fully out of scope. The `SpecterOps/TierZeroTable` README is explicit about this limitation [@specterops-tier-zero-table]. Coverage expansion is iterative and community-fed; OpenGraph is the structural answer to scope generalisation. The original BloodHound (2016) shipped Neo4j only. The modern BloodHound CE uses *both*: PostgreSQL as the application database (users, roles, sessions, audit logs) and Neo4j as the graph layer [@bloodhound-ce-repo]. BloodHound Enterprise has begun migrating entirely off Neo4j onto PostgreSQL-as-graph (announced in v6.3, December 2024) [@bloodhound-v6-3-release]; Cypher continues to be the query language on the new backend. The model is engine-independent; Adalanche proves the same point by doing it all in process memory in Go [@adalanche-repo]. Authorised defensive use on your own forest, yes. Contracted penetration testing within written scope, yes. Running it against a directory you are not authorised to assess is unlawful in most jurisdictions: the Computer Fraud and Abuse Act in the United States, the Computer Misuse Act 1990 in the United Kingdom, and equivalents in most EU jurisdictions. The dual-use posture is fundamental to the tool; legal compliance is the operator's responsibility.

Epilogue

The 2014 analyst with the whiteboard and the 2024 analyst with the Cypher query are doing the same work. The unit of analysis has shifted, and once the unit shifts, the field does not go back.

John Lambert diagnosed it in two sentences in April 2015 [@lambert-2015-defenders-lists-attackers-graphs]. Andy Robbins, Rohan Vazarkar, and Will Schroeder shipped it as BloodHound in August 2016 [@bloodhound-legacy-repo]. SpecterOps extended it through AzureHound in 2020, the CE 5.0 web architecture in 2023, the Tier Zero formalisation in 2023, ADCS composed edges in 2024, Butterfly bi-directional analysis in 2024, and OpenGraph in 2025 [@specterops-2025-opengraph].

Microsoft validated the model with Lateral Movement Paths in 2018, cloud security graph attack-path analysis in 2022 to 2023, and Microsoft Security Exposure Management at Ignite in 2024 [@ms-msem-attack-paths]. The community that shipped the graph won; the community that kept shipping lists is selling compliance reports to the auditors.

The frontier in 2026 is not whether to model attacks as a graph -- that argument is settled. The frontier is how to make the graph weighted (so the shortest path approximates the easiest), how to make it complete (so the unmodelled edges shrink toward zero), and how to make it substrate-independent (so the next enterprise primitive worth modelling -- whatever it turns out to be -- can be ingested without changing the engine). Each of these is a research direction with its own asymptotic ceiling, its own engineering practice, and its own community of contributors.

What started as a sentence in a 1,100-word essay on a personal GitHub repository is now an ISO-standardised query language [@iso-39075-2024-gql], a shipped Microsoft product family, an open-source repository with hundreds of thousands of downloads, and a discipline taught at most major security conferences. The graph wins because the graph is the right model. The right model wins because, eventually, the right model always does.

Privileged Identity Management: How a Two-State Role Assignment Retired Standing Admin

noreply@paragmali.com (Parag Mali) — Mon, 25 May 2026 00:00:00 GMT

**Standing Global Administrator was never a design choice. It was the only posture a single-state role-assignment object could produce.** Microsoft Entra PIM added one field to that object -- `type: eligible | active` -- and everything downstream (activation policies, audit logs, access reviews, six PIM Alerts, PIM-for-Groups, PIM-for-Azure-Resources, GDAP, Lighthouse, PIM with Conditional Access) is a structural consequence of that single change. The pattern works for human users. The open boundary in 2026 is application identities -- service principals, managed identities, OAuth consent grants -- which route around PIM entirely via the Azure Instance Metadata Service endpoint at `169.254.169.254`, the bypass class Andy Robbins documented in June 2022 and MITRE ATT&CK now maps to T1078.004.

1. The Tenant with Zero Standing Global Administrators

At 14:03:01 on a Tuesday in 2026, alice@contoso.com became Global Administrator of her company's Microsoft Entra tenant. At 15:03:01 the same day, she stopped being one. In between, she restored a deleted user, exported an audit log, and produced a single PIM record: Justification reads "incident MSRC-2026-PIM-12345, ticket SNOW-INC-987654"; Approver reads "bob@contoso.com (decided 14:02:17)"; ActivatedAt and ExpiredAt differ by exactly PT1H. The SOC 2 auditor signed it off without follow-up questions.

The 2015-vintage version of the same tenant looked nothing like this. Twelve standing Global Administrators. No multifactor challenge at privilege use. No approval workflow. No justification field. No audit trail beyond ordinary sign-in logs. A single phish of any one of those twelve identities was tenant takeover. The math required no sophistication: the attack surface for "Global Administrator of contoso.com" equalled the union of twelve personal attack surfaces, indefinitely.

What changed between the two tenants is not a habit, not a policy, not a culture shift. It is a single field on a single object inside Microsoft Entra ID.

Key idea: Standing admin was never a deliberate design decision. It was the only deployment posture a single-state role-assignment object could produce. Once Microsoft made the role-assignment object two-state, JIT admin became expressible -- and standing admin became visibly the anti-pattern it had been since 1975.

To explain that field, and to explain why it took fifty-one years to ship, we start where the principle did: a 1975 paper by two MIT researchers who knew what privilege should look like but had no mechanism to enforce it.

2. The Default Wasn't a Decision

Who designed the standing Domain Admin pattern? No one. It was the only assignment category Active Directory shipped with.

A forty-year deployment posture with no author. That is the first thing to internalize. Standing admin is what happens when a data model offers exactly one assignment category and operators still have real work to do. Every later "best practice" was an attempt to talk operators out of the one tool they had been given.

1975: The principle without a mechanism

In September 1975, Jerome Saltzer and Michael Schroeder published The Protection of Information in Computer Systems in the Proceedings of the IEEE [@saltzer-schroeder-1975]. The paper is a survey of secure-systems design, organized around eight named design principles that the authors crystallized from work on Multics and other early protected operating systems. Both authors were affiliated with MIT's Project MAC and the Department of Electrical Engineering and Computer Science [@saltzer-mit-meta].

The sixth principle, named Least Privilege, is the one every later JIT-admin product cites:

Every program and every user of the system should operate using the least set of privileges necessary to complete the job. -- Saltzer & Schroeder, *The Protection of Information in Computer Systems*, 1975, Design Principle (f), the sixth of eight [@saltzer-schroeder-1975] Design Principle (f), the sixth of eight, in the 1975 Saltzer and Schroeder paper. Every program and every user of the system should operate using the least set of privileges necessary to complete the job. The principle is correct, parsimonious, and -- for four decades after publication -- mechanically unenforceable for the temporal case. Static enforcement (ACLs, capability lists, ring boundaries) was tractable in 1975; bounding the time interval during which a privilege is held was not.

Read the principle carefully. It does not say "every user should hold the least set of privileges." It says they should operate using the least set of privileges. The two formulations look identical until you ask what a person does between bursts of administrative work. A user who holds the privilege "permanently active" is operating using it permanently, whether they touch the system or not. The 1975 paper points at the temporal dimension and walks past it. The worked examples cover static mechanisms -- protection rings, access control lists, capability tickets -- not time-bounded ones. The principle was correct. The mechanism did not yet exist.

For the next forty years, every approximation tried to compensate. UNIX sudo (1980) bound elevation to a single command. Kerberos delegation (1988) bound impersonation to a ticket. Windows DACLs and Active Directory groups (1993 and 2000) bound access to a static membership list. None made temporal least privilege a first-class data-model property. None let an operator say "I am eligible to be Domain Admin, but I am not Domain Admin right now."

Microsoft's 2014 *Mitigating Pass-the-Hash v2* whitepaper introduced a three-tier administrative model. Tier 0 is identity-system-critical: domain controllers, ADFS, PKI, anything whose compromise gives forest-wide privilege. Tier 1 is enterprise servers and business-critical applications. Tier 2 is user workstations and end users. The enforcement rule is one sentence: an administrator credential for Tier N must never be exposed to a system at a higher (numerically larger) tier. Microsoft has progressively retired this framing in favour of the Enterprise Access Model, which we revisit in section 6.

2000-2013: Group membership as a boolean

When Active Directory shipped with Windows 2000 on February 17, 2000 [@ms-news-windows-2000-launch], privileged access was structurally a boolean property of the principal. A user was either a member of BUILTIN\Administrators, Domain Admins, Enterprise Admins, or Schema Admins, or they were not. The membership lived in the directory as the member attribute on the group object (and the memberOf back-link on the user). It was set when assignment was made, unset when an administrator manually revoked it. No third state. No attribute could hold one.

A privileged identity whose role assignment is active and permanent. The role's permissions are granted continuously, regardless of whether the principal is currently exercising the privilege. Standing admin is the default state of any pre-PIM tenant and the deployed-reality state of most AD-only environments through 2026.

Kerberos's Privilege Attribute Certificate -- the PAC -- carried the user's group SIDs forward into every Kerberos ticket the user obtained.The Privilege Attribute Certificate is the data structure inside a Kerberos ticket that lists the user's group SIDs. Pre-2016 Active Directory had no per-membership TTL metadata in the PAC. There was nowhere in the existing schema to put an expiry timestamp, which is why on-prem JIT membership later required a separate forest rather than an in-directory mechanism. A ticket's lifetime was bounded; the SID set inside it was not. There was no per-membership TTL anywhere in the system. If you wanted "Alice is Domain Admin between 14:00 and 15:00 today and not otherwise," the directory had no machinery to express it. Alice was Domain Admin permanently, or not at all.

Twenty years of deployment matched the data model exactly. A typical 2010-vintage enterprise ran ten to thirty standing Domain Administrators across business units, because manually adding and removing membership for each task was untenable at human scale. The data model did not punish standing membership; the operator chose the only category the directory offered.

December 2012: Microsoft names the failure mode

In December 2012, Patrick Jungles, Mark Simos, Aaron Margosis, Roger Grimes, Laura Robinson and the Microsoft Trustworthy Computing team published Mitigating Pass-the-Hash and Other Credential Theft, Version 1 [@pth-download-center], [@berkouwer-pth-2013]. It is the first formal Microsoft acknowledgment that credential-theft propagation through Active Directory was not a software defect to be patched but a structural property of standing admin membership.

The argument is direct. If twelve Domain Admins exist, the attack surface of "Domain Admin of contoso.local" is the union of those twelve people's personal attack surfaces. Any one gets phished, or gets hash-extracted from a Tier-1 server they accidentally signed into, and the attacker has Domain Admin permanently. The MIM PAM documentation later restated the failure in one sentence: "Today, it's too easy for attackers to obtain Domain Admins account credentials, and it's too hard to discover these attacks after the fact" [@ms-learn-mim-pam-overview].

2014: The tier model arrives, the mechanism does not

The 2014 update -- Mitigating Pass-the-Hash, Version 2 [@pth-download-center] -- generalized the threat model and introduced the Tier-0 / Tier-1 / Tier-2 framing as a structural mitigation. v2 said two things clearly that v1 had only implied. First, standing membership in Tier-0 groups was the root cause, not a downstream defect. Second, the mitigation pattern -- isolate tiers, reduce the standing count, use dedicated Privileged Access Workstations -- was guidance, not a mechanism. Microsoft Trustworthy Computing did not yet have a product that could mechanically time-bound group membership in Active Directory.

v2 named the problem, drew the threat model, and recommended the structural fix. What it could not do was ship a mechanism. The mechanism would come, but on the wrong side of the cloud boundary.

3. The On-Prem Detour: MIM 2016 PAM, Bastion Forests, and Shadow Principals

Microsoft's first mechanical JIT-admin product was not in the cloud. It was on-premises, and it required a separate Active Directory forest.

Stop and re-read that. To bound the duration of a group membership in pre-2016 Active Directory, Microsoft had to build a different directory and inject SIDs from one into the other across a trust. The reason was the data model. The production forest's member attribute had no TTL field. Adding one meant changing the AD schema. Changing the schema meant a Windows Server release. So while the schema change was in flight, Microsoft shipped the on-prem JIT-admin product on a different architecture: ask the operator to stand up a second forest whose only job was to issue time-bounded SIDs into the first.

August 6, 2015: MIM 2016 ships PAM

On August 6, 2015, Microsoft Identity Manager 2016 reached general availability and shipped a new capability named Privileged Access Management [@ms-learn-mim-pam-overview]. The architecture is the interesting part. MIM PAM uses three primitives that, together, give Active Directory a mechanically time-bounded group membership for the first time:

A bastion forest -- an entirely separate Active Directory forest, sometimes called the "red" forest or "admin" forest, where privileged accounts live.
A one-way PAM trust from the production forest to the bastion forest, configured for selective authentication.
Shadow principal objects in the bastion forest, each carrying a SID that names a real privileged group in the production forest.

A separate Active Directory forest dedicated to housing privileged accounts. In MIM 2016 PAM the bastion forest holds shadow-principal objects whose SIDs point at production-forest privileged groups; a one-way PAM trust lets the production forest accept those SIDs in incoming Kerberos tickets for a bounded duration. An Active Directory object (schema class `msDS-ShadowPrincipal`, introduced in Windows Server 2016) that represents a foreign user, group, or computer in the bastion forest and carries an `msDS-ShadowPrincipalSid` attribute populated with the SID of a production-forest privileged group. Membership in a shadow principal results in that production-forest SID being added to the requesting user's Kerberos PAC for the membership TTL.

The activation flow is direct. A user in the bastion forest requests privilege through the MIM Portal. An approver decides. MIM writes a TTL-bounded membership in the appropriate shadow principal, with the TTL enforced by the Windows Server 2016 temporal-group-membership feature [@teal-esae3]. The bastion KDC injects the production-forest SID into the user's Kerberos PAC. The production forest accepts that SID across the PAM trust. After the TTL expires, subsequent ticket renewals exclude the privileged SID, and the user no longer holds the privilege.

flowchart LR subgraph BASTION["CORP-PRIV bastion forest"] A["Privileged user account"] SP["Shadow principal (msDS-ShadowPrincipal) carries production SID, TTL"] BKDC["Bastion KDC"] A -->|"Time-bound membership"| SP SP --> BKDC end subgraph PROD["CORP production forest"] DA["Domain Admins"] PKDC["Production KDC"] end BKDC -->|"Kerberos ticket carries injected SID via PAM trust"| PKDC PKDC -->|"SID in PAC grants membership for TTL only"| DA

October 15, 2016: Windows Server 2016 makes the mechanism real

For the first fourteen months of MIM 2016's life, the full feature did not work. The temporal-group-membership and shadow-principal schema classes that MIM PAM depends on are AD primitives that arrived only with Windows Server 2016, which reached general availability on October 15, 2016 [@ms-learn-lifecycle-ws2016]. Microsoft Learn states the requirement directly: "With Windows Server 2016, PAM features of time-limited group memberships and shadow principal groups are built into Windows Server Active Directory" [@ms-learn-raise-bastion], and "All domain controllers in the bastion environment for the PRIV forest must be Windows Server 2016 or later" [@ms-learn-raise-bastion].The PAM trust is technically a forest trust with selective authentication enabled. The selective authentication flag is what prevents the bastion forest's privileged identities from being usable for anything other than the explicit shadow-principal SID injection -- without it, the bastion forest would itself become a sprawling privileged-access surface.

This is the moment AD itself gains a temporal least-privilege primitive, forty-one years after Saltzer and Schroeder published the principle. The mechanism is real, but the operational profile is brutal.

Three reasons it did not generalize

MIM PAM solved exactly one problem and could not be extended to the next. Three structural constraints kept it confined to a niche.

First, it was on-premises only. A bastion forest is an Active Directory artifact. Microsoft Entra ID, Office 365, and Azure RBAC role assignments live in a different identity system, with no concept of a forest, no PAM trust target, and no place to plug a shadow-principal object. MIM PAM had no cloud story, and by 2015 the cloud was already where most new Microsoft privileged-access surfaces were being deployed.

Second, the operational complexity filtered out everyone except the most security-mature shops. A bastion forest is a separate Active Directory forest, with its own domain controllers, replication, backup, disaster recovery, and PKI implications. The deployment also requires MIM Service, MIM Portal, MIM Web Service, and SQL Server. Auditing the PAM trust correctly is itself non-trivial work. Microsoft Learn now positions MIM PAM as appropriate only for isolated, non-Internet-connected deployments [@ms-learn-mim-pam-overview]; the verbatim positioning and the MIM 2016 lifecycle details are in the Callout below.

Note: Microsoft Learn states MIM PAM is "not recommended for new deployments in Internet-connected environments" and positions it for "isolated AD environments where Internet access is not available" [@ms-learn-mim-pam-overview]. MIM 2016 itself remains in extended support through January 9, 2029 [@ms-learn-mim-2016], and Microsoft has shipped SP3 compatibility updates for SharePoint Subscription Edition, Exchange SE, and SQL Server 2022 -- but the cloud-first Entra PIM path is the canonical answer for new tenants.

Third, the forest-functional-level dependency delayed real deployment by more than a year. Shadow principals were not usable until Windows Server 2016 reached GA in October 2016. MIM 2016 had been generally available since August 2015. For its first fourteen months in market, the headline JIT-admin feature could not be configured at full fidelity. By the time Windows Server 2016 shipped, Microsoft was already operating its cloud PIM in production.

What the on-prem detour reveals about the cloud's shape

MIM PAM mechanically bounds membership in groups via shadow principals in a separate forest. The cloud has no concept of a forest. So the cloud-native mechanical bound must attach to the assignment object directly, not to the group object indirected through a separate forest. The cloud needed a new assignment-category type, not a new forest topology.

The cloud does not have a forest. It has a role-assignment object. What if that object grew a second state?

4. The Breakthrough: A Two-State Role-Assignment Object

By August 2015, while MIM 2016 PAM was still in late preview for the on-premises case, the Microsoft Identity Division had already shipped something different for the cloud. They shipped a role-assignment object with one new field. That field changed everything that came after it.

The 2015 preview

Alex Simons's August 27, 2015 capability-update post on the CloudBlogs (now migrated to Microsoft Tech Community) is the first public articulation of what Azure AD PIM was building [@simons-2015-aug]. It introduced four surfaces: an eligible assignment category distinct from active, multifactor authentication required at activation, security alerts that watched for privileged-role anomalies, and what the post called Security Reviews -- the precursor to access reviews. The architecture under those four surfaces is the load-bearing part: a single new field on the role-assignment object.

On September 15, 2016, Azure AD Premium P2 reached general availability and carried the first generally-available cloud-native PIM, attributed to Joy Chik (then Corporate Vice President of the Identity Division) and the Identity engineering team [@techcommunity-p2-ga]. Eligible-versus-active was now a billable, supported, production-grade feature.

The one-function spine

Read this carefully. It is the article's central claim.

Key idea: Standing admin was the default not because anyone thought it was secure, but because the role-assignment object had only one state. PIM's contribution is to add a second state -- eligible -- and to make the transition from eligible to active a gated, audited, time-bounded operation that is by definition mediated by PIM.

The principle was Saltzer and Schroeder, 1975. The recognition that standing admin was the failure mode was Mitigating Pass-the-Hash, 2012 and 2014. The on-premises mechanism was MIM 2016 PAM. The cloud answer is a different shape entirely: not a new directory and a SID-injection trust, but a single field on the assignment object itself.

Microsoft Learn documents the resulting terminology in the PIM overview. A principal -- user, group, service principal, or managed identity -- can be eligible or active for a role, and either assignment can be permanent or time-bound [@ms-learn-pim-configure]. The same page elevates a forty-year-old phrase into a product term: "principle of least privilege access -- A recommended security practice in which every user is provided with only the minimum privileges needed to accomplish the tasks they're authorized to perform" [@ms-learn-pim-configure]. The 1975 sentence is now a glossary entry inside a 2026 product, and the product has a mechanism that makes the sentence enforceable.

The formal tuple

Concretely, a PIM-managed role assignment is a 5-tuple. Let $A = (p, r, s, t, d)$ where $p$ is the principal, $r$ is the role, $s$ is the scope, $t \in {\text{eligible}, \text{active}}$, and $d \in {\text{permanent}, \text{time-bound}[s_0, e_0]}$. The activation transition is

$$\text{activate}: A_{t=\text{eligible}} \longrightarrow A_{t=\text{active},\ d=\text{time-bound}[\text{now},\ \text{now}+\Delta]}$$

subject to the per-role activation policy. The interesting part is what the tuple makes expressible:

RoleAssignment = {
    principal:  user | group | service principal | managed identity,
    role:       Entra directory role | Azure RBAC role | group membership | group ownership,
    scope:      directory | management-group | subscription | resource-group | resource | group,
    type:       eligible | active,
    duration:   permanent | time-bound[start, end]
}

activate: eligible_assignment -> active_assignment   // PIM-mediated, gated, audited

A PIM-managed role assignment that grants no privilege until the principal invokes `activate()`. The eligible assignment is the standing relationship between principal and role; the active assignment is the time-bounded materialization that follows when the activation policy is satisfied [@ms-learn-pim-configure]. A PIM-managed role assignment that grants the role's permissions for the duration of the assignment. Active assignments are either permanent (the legacy pre-PIM posture, or an explicit permanent-active PIM assignment) or time-bound (the result of an `activate()` call on an eligible assignment) [@ms-learn-pim-configure]. flowchart TD subgraph Permanent["Permanent duration"] PE["Permanent eligible -- standing eligibility, no privilege held"] PA["Permanent active -- legacy standing admin"] end subgraph TimeBound["Time-bound duration"] TE["Time-bound eligible -- standing eligibility with end date"] TA["Time-bound active -- JIT admin after activate()"] end PE -->|"activate()"| TA TE -->|"activate()"| TA TA -->|"expire or deactivate()"| PE PA -->|"legacy posture being retired"| PE

The grid has only four cells. Permanent active is the pre-PIM world, the standing-admin posture every later best practice has been trying to retire. Time-bound active is the JIT-admin state, materialized only at the moment of work and expired shortly after. The two eligible states -- permanent or time-bound -- are the standing relationships between a principal and a role that grant no privilege at rest. The expressive change is small. The deployment consequences are total.

PIM did not add eight features. It added one field, and everything else is downstream.

This is Aha #1. The reader who came in believing standing admin persisted for forty years because operators lacked discipline now sees it differently. Operator discipline was a fragile workaround for a missing data-model field. The 1975 principle was correct. The 2012-2014 PtH whitepapers were correct. The operators were not the problem. The role-assignment object had one state to be in, and the deployment matched the data model exactly. The fix was a structural change to the data model.

The next nine years of PIM history are about extending that two-state primitive: to Azure RBAC, to security groups, to partner tenants, to the conditional-access plane, and to a detection layer that flags people who try to skip activation entirely. We walk each extension in turn. First, the mechanism itself.

5. Anatomy of an Activation

We have seen what changed. Walk through what happens, end to end, when alice@contoso.com clicks "Activate" on her eligible Global Administrator assignment at 14:00:00 on a Tuesday.

The activation flow, step by step

Six things happen, in order, and each writes audit-log evidence:

The eligible assignment already exists. Alice has been a permanent-eligible Global Administrator since she was hired. The PIM directory object records principal alice@contoso.com, role Global Administrator, scope directory, type=eligible, duration=permanent. Today she holds zero of the role's permissions.
The activation request lands on PIM. Alice clicks Activate in the Entra admin centre, or fires the equivalent Microsoft Graph call. PIM pulls the activation policy for (role=Global Administrator, scope=directory) and prepares to evaluate the gates [@ms-learn-pim-change-default-settings].
The policy gates evaluate. This is the load-bearing part, and the place readers most often misread the docs. The gates are per-role configurable, not universal. Microsoft Learn documents five gates the tenant can independently switch on or off [@ms-learn-pim-change-default-settings]:
- Multifactor authentication at activation if requires_mfa is set.
- Approval routing to named approvers or an approver group if requires_approval is set.
- Justification text capture if requires_justification is set.
- Ticket number capture, optionally tagged with a ticketing-system identifier, if requires_ticket is set.
- Activation duration validation against the per-role configurable maximum -- one to twenty-four hours, with one hour the default for the highest-privileged Entra roles such as Global Administrator and Privileged Role Administrator [@ms-learn-pim-change-default-settings].
PIM materializes the active assignment. Microsoft Learn states the latency directly: "Microsoft Entra PIM creates active assignment (assigns user to a role) within seconds" [@ms-learn-pim-activate]. A new token Alice obtains after this moment will carry the activated role's claims.
The PIM audit log records the entire transaction. A new entry captures the request, the approver's decision and decision time, the justification text, the ticket reference, the activation start, and the planned expiry. The audit log is retained for thirty days by default and can be routed to Azure Monitor for longer retention [@ms-learn-pim-audit-log].
Auto-deactivation fires at the duration boundary. At 15:00:00 -- one hour after activation -- PIM deactivates the assignment within seconds [@ms-learn-pim-activate]. Alice can also call deactivate() explicitly to return early.

sequenceDiagram autonumber participant User as alice participant PIM participant MFA participant Approver as bob participant Graph as Microsoft Graph participant Audit as PIM audit log User->>PIM: Activate Global Administrator PIM->>MFA: Require MFA challenge MFA-->>PIM: MFA passed PIM->>Approver: Route approval request Approver-->>PIM: Approve with justification context PIM->>Graph: Materialize active assignment within seconds PIM->>Audit: Write request, decision, materialization records Note over PIM,Audit: Token issued with activated role claims Note over PIM,Graph: One-hour TTL begins PIM->>Graph: Auto-deactivate at expiry within seconds PIM->>Audit: Write deactivation record

Activation policies are configured, not assumed

Two of the most common misunderstandings the documentation receives are about this configurability. First, MFA at activation is not universally required by PIM. The role's activation policy must be set to require it. Second, the activation maximum is configurable per role per scope inside a one-to-twenty-four-hour range, with the default for Global Administrator and Privileged Role Administrator at one hour [@ms-learn-pim-change-default-settings]. A "PIM tenant" where one role requires MFA and approval and another role requires only justification text is a perfectly valid configuration; both roles are PIM-gated, but their gate sets differ.

A per-role-per-scope configuration of which gates an activation must satisfy: MFA at activation, approval, justification, ticket number, and the activation maximum duration. PIM evaluates the policy at activation time. The gates are independent flags; any combination can be required [@ms-learn-pim-change-default-settings].

Note: PIM's activation maximum duration is configurable per role per scope in the one-to-twenty-four-hour range. The default value for the highest-privileged Entra directory roles -- Global Administrator and Privileged Role Administrator -- is one hour [@ms-learn-pim-change-default-settings]. Other roles default to higher values. Tighten the duration where you can; the activation cost is small, the standing-active surface saving is large.

Authentication context: gating activation, not sign-in

Conditional Access has gated sign-in since 2014. Until 2023, it had no way to gate the activation event itself. The integration between PIM and Conditional Access changes that by attaching an authentication context label to the activation, which Conditional Access can target the same way it targets any other authentication. Microsoft Learn includes the activation policy option "On activation, require Microsoft Entra Conditional Access authentication context" [@ms-learn-pim-change-default-settings].

A label that PIM attaches to the activation event so that Conditional Access policies can target the activation itself, not just the sign-in. Policies such as "activation of Global Administrator requires a compliant device and an MFA challenge issued within the last five minutes" become expressible without bolting on a third-party stack [@ms-learn-pim-change-default-settings].

The activation gate, as code

To make the gate-composition idea concrete, here is the activation policy as a small JavaScript function. Edit the policy or the request and re-run it.

{` function activate(request, policy) { // policy gates are independent; any combination can be required if (policy.requires_mfa && !request.mfa_passed) { return { ok: false, reason: 'MFA challenge failed or absent' }; } if (policy.requires_approval && !request.approval_decision) { return { ok: false, reason: 'Approval pending' }; } if (policy.requires_justification && !request.justification) { return { ok: false, reason: 'Justification text missing' }; } if (policy.requires_ticket && !request.ticket_number) { return { ok: false, reason: 'Ticket number missing' }; } if (request.duration_hours > policy.max_duration_hours) { return { ok: false, reason: 'Requested duration exceeds policy maximum' }; } // activation succeeds: materialize a time-bound active assignment const expires_at = new Date(Date.now() + request.duration_hours * 3600 * 1000); return { ok: true, active_assignment: { principal: request.principal, role: request.role, scope: request.scope, type: 'active', duration: { kind: 'time-bound', start: new Date(), end: expires_at } } }; }

const policy = { requires_mfa: true, requires_approval: true, requires_justification: true, requires_ticket: true, max_duration_hours: 1 }; const request = { principal: 'alice@contoso.com', role: 'Global Administrator', scope: 'directory', mfa_passed: true, approval_decision: 'approve', justification: 'MSRC-2026-PIM-12345', ticket_number: 'SNOW-INC-987654', duration_hours: 1 }; console.log(activate(request, policy)); `}

The function is mechanical and short for a reason. Every PIM gate is independently expressible, the policy is a record, the request is a record, and the active-assignment output is itself a record the system can audit. The complexity of PIM, such as it is, lives in the surrounding infrastructure -- the directory, the audit log, Conditional Access, the alert engine -- not in the gate itself.

The Azure-resource five-minute floor

One operational detail belongs here.Azure resource role assignments under PIM-for-Azure-Resources carry an additional latency floor: an Azure resource role assignment cannot be made for a duration of less than five minutes and cannot be removed within five minutes of being created [@ms-learn-pim-resource-roles]. This is the rare place where the cloud control plane exposes a hard minimum-time bound in its assignment-state machine, and it shapes the lower limit of any tightening strategy on Azure RBAC scopes.

Activation is the per-event control. But what about the standing posture across the tenant -- the eligibility surface, the drift you did not notice, the assignment configuration in places PIM does not reach by default? For that, you need access reviews, and you need to push the eligible/active primitive beyond the original twenty-eight built-in directory roles.

6. Beyond Directory Roles: Extending Eligible and Active Across Four Boundaries

PIM at GA in September 2016 covered roughly twenty-eight built-in Entra directory roles. Everything else -- Azure RBAC, security groups, partner-tenant delegation, the Conditional Access activation event -- was still single-state and permanent-active. The next nine years of PIM history are the story of closing those four boundaries, one at a time.

flowchart TD Core["Two-state assignment object, 2016"] Core --> Azure["PIM for Azure Resources, 2017-2019, RBAC at four scopes"] Core --> Groups["PIM for Groups, GA October 2023, membership and ownership"] Core --> Partner["GDAP May 2022 plus Azure Lighthouse eligible authorizations"] Core --> CA["PIM with Conditional Access authentication context, GA October 2023"]

Boundary 1: PIM for Azure Resources

Between 2017 and 2019, Microsoft extended the eligible-versus-active model from Entra directory roles to Azure RBAC. The extension covers four scopes -- management group, subscription, resource group, and individual resource -- and supports both built-in roles (Owner, Contributor, User Access Administrator, and the security roles) and custom roles [@ms-learn-pim-resource-roles].

The non-obvious operational property of PIM-for-Azure-Resources is that role settings do not inherit down the RBAC hierarchy. A policy you tighten on Owner at the management-group scope does not automatically flow down to Owner on subscriptions, resource groups, or resources beneath it. Each (role, scope) pair is its own policy slot, and each must be configured.

Note: Configure activation policies per role per scope explicitly across the management-group, subscription, resource-group, and resource hierarchy. A tightening at the management-group scope does not flow to subscriptions beneath it. The most common operational defect in mature PIM tenants is the unconfigured policy at a downstream scope, leaving a wide-open activation surface under what looked like a hardened parent.

Boundary 2: PIM for Groups

The PIM-for-Groups timeline is three distinct events. In August 2020, Microsoft previewed the feature under its original name, "Privileged Access Groups," and limited the preview scope to role-assignable security groups [@simons-2020-aug]. In January 2023, Microsoft renamed the feature to "Privileged Identity Management for Groups" in the Entra admin centre; the underlying eligible/active model was unchanged [@ms-learn-pim-for-groups]. In October 2023, more than three years after the preview, PIM for Groups reached general availability with a broader scope -- role-assignable security groups (carried forward), non-role-assignable security groups (newly supported), and Microsoft 365 groups (newly supported), with JIT for both membership and ownership [@ms-techcommunity-pim-groups-ca-ga-2023], [@ms-learn-pim-for-groups], [@ms-learn-pim-groups-role-settings].The three events span more than three years and should not be conflated. August 2020: preview of "Privileged Access Groups," role-assignable security groups only [@simons-2020-aug]. January 2023: rename to "PIM for Groups"; same scope and model [@ms-learn-pim-for-groups]. October 2023: general availability with the broader scope (non-role-assignable security groups plus M365 groups), and JIT for both membership and ownership [@ms-techcommunity-pim-groups-ca-ga-2023]. Two structural exclusions persist throughout: dynamic-membership groups and groups synchronized from on-premises Active Directory [@ms-learn-pim-for-groups]. The scope is broad: any Entra security group and any Microsoft 365 group, except dynamic-membership groups and on-premises-synced groups, can be PIM-enabled [@ms-learn-pim-for-groups].

The interesting design choice is that PIM-for-Groups gates two distinct surfaces per group: membership and ownership. The two surfaces each get their own activation policy [@ms-learn-pim-groups-role-settings].

The extension of PIM eligible/active assignment to Entra security groups and Microsoft 365 groups. Originally previewed in August 2020 as "Privileged Access Groups" (role-assignable security groups only) [@simons-2020-aug]; renamed to "PIM for Groups" in January 2023 [@ms-learn-pim-for-groups]; reached general availability in October 2023 with the broader scope (role-assignable security groups, non-role-assignable security groups, and M365 groups), with JIT for both membership and ownership [@ms-techcommunity-pim-groups-ca-ga-2023]. Excludes dynamic-membership groups and groups synchronized from on-premises environments [@ms-learn-pim-for-groups], [@ms-learn-pim-groups-role-settings]. A group owner can add members. A privileged access group whose membership is PIM-gated but whose ownership is permanent-active offers an unmediated elevation path: a compromised owner adds themselves as a member, bypassing the membership gate they would have had to activate. PIM-for-Groups gates both surfaces because gating membership without gating ownership is a one-bypass-step elevation. The two policies are independent; both must be set.

Boundary 3: Partner tenants -- GDAP and Azure Lighthouse

Until 2022, the Microsoft partner channel -- Cloud Solution Providers and Managed Service Providers -- worked through a model called Delegated Admin Privileges (DAP), in which the partner held standing Global Administrator on every customer tenant they touched. The Nobelium supply-chain attack tradition of 2020-2021 made the structural risk of that posture unignorable [@cisa-aa20-352a]: one compromise of one partner credential meant Global Administrator across hundreds or thousands of customer tenants simultaneously.

In May 2022, Microsoft introduced Granular Delegated Admin Privileges (GDAP) [@ms-learn-gdap], [@crayon-gdap]. GDAP replaces the standing-GA pattern with time-bound (one to seven-hundred-thirty days) and role-scoped delegation between partner and customer tenants. Microsoft Learn's framing makes the design explicit: "GDAP is a security feature that provides partners with least-privileged access following the Zero Trust cybersecurity protocol. It lets partners configure granular and time-bound access to their customers' workloads in production and sandbox environments. Customers must explicitly grant the least-privileged access to their partners" [@ms-learn-gdap].

The May 2022 Microsoft Partner Center capability that replaces legacy DAP's standing-Global-Administrator-on-every-customer-tenant pattern with time-bound (one to seven-hundred-thirty days) and role-scoped delegation between partner and customer tenants. GDAP is the partner-tenant analogue of PIM eligible assignment [@ms-learn-gdap].

The Azure plane has a parallel construct. Azure Lighthouse eligible authorizations, introduced alongside GDAP, extend PIM-for-Azure-Resources eligibility across the tenant boundary [@ms-learn-lighthouse-eligible]. The customer (not the partner) controls the PIM policy on the delegated authorization. One important exception: service principals cannot use eligible authorizations, because there is currently no way for a service principal to elevate its access [@ms-learn-lighthouse-eligible]. The application-identity gap we reach in section 9 reaches into Lighthouse too.

Boundary 4: PIM and Conditional Access authentication context

The October 2023 GA wave closed the activation-gate-versus-sign-in-gate gap. Before October 2023, Conditional Access could gate sign-in into the tenant, but it could not gate the activation event itself. After October 2023, an authentication-context-tagged Conditional Access policy can target activation specifically [@ms-techcommunity-pim-groups-ca-ga-2023]. A policy of the form "activation of any control-plane role requires a compliant device and a fresh MFA challenge" becomes expressible without third-party tooling [@ms-learn-pim-change-default-settings].

The retirement of Tier-0, Tier-1, Tier-2

The umbrella framing has also shifted. Microsoft's 2014 Tier-0 / Tier-1 / Tier-2 model is being progressively retired in favour of the Enterprise Access Model (EAM), which uses control plane, management plane, and data/workload plane as the structural divisions [@ms-learn-eam]. EAM is cloud-native where Tier-0/1/2 was on-premises-centric. Microsoft Learn states the mapping: "Tier 0 expands to become the control plane and addresses all aspects of access control", and "what was tier 1 is now split into the following areas: Management plane ... Data/Workload plane" [@ms-learn-eam].

The post-2021 Microsoft reference architecture that replaces the Tier-0/Tier-1/Tier-2 administrative model with a plane-based division: control plane, management plane, and data/workload plane. EAM is cloud-native and zero-trust-friendly where Tier-0/1/2 was on-premises-centric [@ms-learn-eam]. Microsoft's RaMP -- the Rapid Modernization Plan -- is the post-2018 deployment roadmap that operationalizes EAM [@ms-docs-github-ramp].

The retirement is partial. The practitioner audience still uses Tier-0/1/2 more often than EAM in day-to-day language. The Microsoft Learn page for Securing Privileged Access explicitly cross-references both [@ms-learn-spa-overview].

Coverage is one half of the story. The other half is detection. What does PIM do when someone in the Privileged Role Administrator role simply assigns Global Administrator to a user directly through Microsoft Graph, bypassing the activation workflow entirely?

7. The Detection Layer: Six PIM Alerts and the Assignment-Bypass Class

PIM gates activation. The first question every adversary thinks of, and every architect should think of next, is: what about the assignment itself? What happens when someone in the Privileged Role Administrator role just creates a permanent-active Global Administrator assignment directly, skipping the eligible-to-active workflow entirely?

The answer is the article's second aha moment, and it is deliberately surprising.

The six PIM Alerts

Microsoft Learn documents seven named alerts in the PIM Alerts surface for Microsoft Entra roles [@ms-learn-pim-alerts]. Six of them are behavioural detections; the seventh is a licensing-precondition alert that fires when the tenant lacks the appropriate license.The seventh alert, named "The organization doesn't have Microsoft Entra ID P2 or Microsoft Entra ID Governance," is a low-severity licensing-precondition alert. The "six PIM Alerts" framing in this article refers to the six behavioural alerts; the licensing alert is structurally distinct. The six behavioural alerts, with the canonical names verbatim from the documentation, are:

#	Alert (verbatim)	Severity	What it detects	Configurable threshold
1	There are too many Global Administrators	Low	Tenant exceeds a tunable count and percentage of standing GAs	Minimum count 2-100 and percentage 0-100%
2	Roles are being assigned outside of Privileged Identity Management	High	A privileged role assignment was created via Microsoft Graph or the classic admin centre without going through PIM	None (binary)
3	Roles are being activated too frequently	Low	Post-hoc activation-frequency anomaly	Activation count and time window
4	Administrators aren't using their privileged roles	Low	Staleness on activation; eligible assignment unused	0-100 day threshold
5	Roles don't require multifactor authentication for activation	Low	Configuration drift on the per-role activation policy	None (binary on role policy)
6	Potential stale accounts in a privileged role	Medium	Sign-in staleness on a privileged principal	1-365 day threshold

The third row -- "Roles are being assigned outside of Privileged Identity Management" -- is the load-bearing one. Microsoft Learn rates it High severity because it is the alert that fires when somebody routed around PIM entirely [@ms-learn-pim-alerts]. The verbatim documentation reads: "Privileged role assignments made outside of Privileged Identity Management aren't properly monitored and might indicate an active attack" [@ms-learn-pim-alerts].

The High-severity PIM Alert "Roles are being assigned outside of Privileged Identity Management." It fires when a privileged role is assigned via a path other than PIM -- typically via Microsoft Graph, the classic admin centre assignment surface, or PowerShell. The alert is detective. It fires after the assignment is created [@ms-learn-pim-alerts].

Detective, not preventive -- and why

Read the definition again. The alert fires after the assignment is created. PIM does not block direct assignments outside its workflow.

For most architects this lands hard. The reasonable next thought is "if PIM does not block the bypass, what is the point?" Sit with that thought, then read the design rationale.

The Microsoft Graph endpoints that allow direct role assignment are the integration surface every legitimate administrative tool uses. Identity Governance products use them. CI/CD identity provisioning scripts use them. Break-glass automations use them. Microsoft's own admin centres use them in some configurations. The customer-side tools that scan, audit, remediate, and provision against the tenant use them. A preventive block on direct assignment would break every one of those integrations. It would also break PIM itself; the eligible-to-active materialization step is a write to the same assignment surface.

Note: PIM does not block direct role assignments outside its workflow because blocking would break the Microsoft Graph integration surface every legitimate administrative tool uses. The High-severity assignment-bypass alert is detective: it fires after the assignment is created. Customers who need preventive blocking layer a separate Conditional Access policy on the Graph endpoint, an Azure Policy at the management-group scope, or an entitlement-management workflow on top of PIM.

This is Aha #2. The reader who walked in expecting PIM to be a "deny direct assignments" product walks out understanding why the design says "alert loudly via High severity, then let the customer layer preventive controls based on their tooling estate." The trade-off is named, not hidden.

The 1000-notification ceiling and the SIEM-side correlation

One operational footnote and one wider observation. The notification fan-out has a hard cap: "The maximum number of notifications sent per one event is 1000. If the number of recipients exceeds 1000, only the first 1000 recipients will receive an email notification" [@ms-learn-pim-alerts]. Very large tenants whose privileged groups exceed the cap should not rely on email-notification fan-out alone.The detection layer beyond PIM Alerts is Microsoft Sentinel UEBA, which builds dynamic behavioural profiles for users, hosts, IP addresses, applications, and other entities and emits anomaly scores against AuditLogs operations including role-eligibility additions and activations [@ms-learn-sentinel-ueba]. Sentinel UEBA is the closest 2026 Microsoft-shipped activation-anomaly-scoring surface; it is detective SIEM correlation, not synchronous gating.

The wider observation is that the PIM detection layer is one piece of a larger pipeline. PIM Alerts give you the High-severity assignment-bypass detection. Microsoft Sentinel UEBA gives you per-user behavioural-anomaly scoring against the audit-log events [@ms-learn-sentinel-ueba]. Entra ID Protection gives you sign-in-risk and user-risk classifications for the principal whose token was used. The mature 2026 deployment correlates all three; the assignment-bypass alert is the floor of that pipeline, not the ceiling.

Microsoft solved the JIT-admin problem with a two-state assignment object, four extension surfaces, and a six-alert detection layer. Did the rest of the industry agree? Look at what AWS and Google bet on, and at the third-party vault market that predates both.

8. Competing Architectures: AWS Sessions, GCP Bindings, and the Vault Model

Microsoft bet on a two-state assignment object. The rest of the industry placed different bets.

AWS bet on the session credential. Google bet on the conditional binding. The third-party PAM market bet on the vault. HashiCorp bet on the ephemeral credential. Each architecture is a different answer to one question: what should be the bounded unit of privilege? PIM bounds the assignment state; AWS bounds the session; GCP bounds the binding; CyberArk and Vault bound the credential. The methods are architecturally distinct, and they coexist in real estates more often than they compete.

AWS: bound the session

AWS IAM Identity Center plus the Security Token Service AssumeRole API bound the session, not the assignment. Permanent role-bindings -- permission sets attached to identities -- are themselves standing. The temporary part is the session that materializes when the identity calls AssumeRole. AWS documents this directly: "Temporary security credentials are short-term, as the name implies. They can be configured to last for anywhere from a few minutes to several hours. After the credentials expire, AWS no longer recognizes them or allows any kind of access from API requests made with them" [@aws-temp-creds].

The session lifecycle is concrete. AssumeRole returns an access key, a secret key, and a session token, with a minimum fifteen-minute and a maximum twelve-hour session duration; the API operation default is one hour [@aws-roles-use]. IAM Identity Center permission sets ship with a one-hour default and a one-to-twelve-hour configurable range [@aws-sessionduration].

The AWS Security Token Service API by which a principal materializes a time-bounded session credential -- access key, secret key, session token -- from a permanent role-binding. The session is the ephemeral artifact; the binding is permanent [@aws-temp-creds], [@aws-roles-use].

The AWS approach has clear strengths in multi-account AWS Organizations and in programmatic access. It is also the natural fit for any workload that needs short-lived credentials. The gaps relative to PIM: no built-in approval workflow, no equivalent of the PIM Alerts surface, and no eligible-versus-active distinction on the role-binding itself. A standing AssumeRole grant is, structurally, standing privilege; what is bounded is the session that consumes it.

Google Cloud: bound the binding

Google Cloud IAM took a different route. IAM Conditional Bindings let an allow policy include a Common Expression Language predicate that is evaluated at request time. The canonical temporal pattern is request.time < timestamp(...), which expires the binding at a wall-clock instant [@gcp-conditions]. There is a practical ceiling of one hundred conditional bindings per allow policy.

On top of conditional bindings, Google launched Privileged Access Manager (PAM) in public preview in May 2024 [@gcp-iam-release-notes], [@gcp-pam]. PAM adds the entitlement-and-grant workflow that PIM ships natively: eligible principals, eligible roles, max duration, justification, approvers, and notifications, with grant duration enforced by the underlying conditional binding revocation. Audit-event correlation is documented in a separate page [@gcp-pam-audit].

A Google Cloud IAM role binding that includes a Common Expression Language predicate evaluated at request time. The most common temporal pattern, `request.time < timestamp(...)`, expires the binding at a wall-clock instant; Google Cloud Privileged Access Manager layers an entitlement-and-grant workflow on top [@gcp-conditions], [@gcp-pam].

The GCP approach is the closest hyperscaler analogue to PIM's eligible/active model in architecture, but the PAM productization shipped in preview in May 2024 [@gcp-iam-release-notes] -- nearly a decade after Azure AD PIM's 2016 GA -- and the alert and detection surfaces are correspondingly less mature.

The third-party vault: CyberArk, BeyondTrust, Delinea

The longest-standing answer is the one the third-party PAM market built. CyberArk, BeyondTrust, and Delinea -- all three 2024 Gartner Magic Quadrant Leaders for Privileged Access Management [@cyberark-press-2024], [@beyondtrust-press-2024], [@delinea-press-2024] -- bound the credential, not the assignment or the session. The credential exists permanently in the vault; access to the credential is bounded by session brokering, periodic password rotation, and full session recording.

The vault model has structural strengths PIM's role-assignment-state model cannot match. The vault covers heterogeneous estates that include Windows, Linux, network devices, databases, mainframes, and OT/SCADA appliances -- every system whose credentials cannot be re-architected to a cloud-IAM eligible-active object. Vault-and-broker products provide session recording for SOX and PCI-DSS evidence collection, and they integrate with credential-rotation workflows for legacy vendor appliances whose hard-coded credentials cannot be eliminated.

Most large enterprises run both Entra PIM (for Entra and Azure role assignments) and a third-party PAM product (for SSH, on-premises service accounts, database passwords, network devices). The two markets are complements more than substitutes.

HashiCorp Vault and OpenBao: bound the credential's lifetime

HashiCorp Vault took the credential-bounded idea and made it ephemeral through dynamic secrets: a credential materialized on demand by Vault for a configured backend (a database, a cloud IAM, a PKI), returned with a lease and TTL, and revoked at the backend when the lease expires [@vault-databases]. The OpenBao fork, governed under the Linux Foundation, preserves the same dynamic-credential semantics [@openbao].OpenBao was created in late 2023 after HashiCorp moved Vault from the open-source MPL to the Business Source License. The Linux Foundation announced on April 30, 2024 that OpenBao would join LF Edge as one of four new projects (alongside EdgeLake, InfiniEdgeAI, and InstantX) at the Open Networking and Edge (ONE) Summit [@lfedge-openbao-2024]. The dynamic-secret primitive -- "create a credential, hand it out, revoke it at lease expiry" -- is preserved on both code lines.

A credential materialized by Vault on demand for a configured backend -- database, cloud IAM, or PKI -- returned with a lease ID and TTL; at lease expiry Vault revokes the credential at the backend. The canonical 2026 open-source primitive for replacing hard-coded application credentials [@vault-databases].

The Vault story matters for our purposes because it is the strongest 2026 coverage of the application-identity surface -- dynamic database credentials, Kubernetes service-account tokens, cloud-IAM short-lived credentials. PIM does not cover that surface today; Vault does. This previews the open boundary in section 9.

What is bound, in one comparison table

Method	What is bound	Mechanism	Default duration	Approval workflow	Detection layer	Partner tenant	Application identities	License
Entra PIM	Assignment state	eligible -> active transition with policy gates	1h (Global Admin)	Built-in approver routing	Six behavioural PIM Alerts plus Sentinel UEBA	GDAP + Lighthouse	Not yet (open boundary)	Entra ID P2 or Entra ID Governance
AWS IAM Identity Center + STS	Session credential	AssumeRole returns access/secret/session token	1h	Not built-in	Not equivalent to PIM Alerts	Not directly comparable	Strong (short-lived creds native)	Included in AWS
GCP IAM + PAM	Policy binding	CEL predicate plus entitlement-and-grant	Per entitlement	Built-in via PAM	Audit events plus Cloud Audit Logs	Cross-org via folders	Service-account impersonation	Included in GCP
CyberArk/BeyondTrust/Delinea	Credential knowledge	Vault stores, broker hands out, rotates	Per session policy	Built-in approver routing	Session recording, full SIEM integration	Per-tenant deployment	Coverage via shared accounts	Per-seat commercial
HashiCorp Vault / OpenBao	Credential lifetime	Lease-based revocation, dynamic secrets	Per backend, per lease	Optional plugins	Audit log; lease events	N/A	Strong (dynamic secrets)	Open source / commercial

The five methods occupy four positions on the "what is bound" axis: assignment-state (PIM), session-credential (AWS), policy-binding (GCP), and knowledge-of-credential (CyberArk and Vault). The methods are architecturally distinct, and the right enterprise answer in heterogeneous estates is some composition of more than one.

PIM is the most mature JIT-admin product in the cloud, and it has the most complete coverage of the user-principal surface. The remaining gaps are not about catching up to the competitors; they are about a class of identity the eligible/active model was never designed to gate.

9. What the JIT-Admin Pattern Does NOT Close

For all the architectural elegance of the two-state assignment object, PIM does not close the JIT-admin problem. It closes a sub-problem, very well, and leaves five structural limits an honest treatment must name.

9.1 Standing eligibility is itself standing privilege

PIM bounds the active duration. It does not bound the eligibility duration. A user with a permanent-eligible Global Administrator assignment is one activate() call away from the role's permissions for the next hour. If that user has been phished -- credential plus MFA bypass via a session-cookie capture, say -- the attacker can satisfy the gates. The MFA challenge passes. The justification text is whatever the attacker types. The approval, if required, routes to the legitimate approver, who may approve a legitimate-looking request that actually came from the attacker.

PIM produces an audit-log record of every step. It does not produce a structural impossibility. Eligibility is itself a security-critical property of the identity, and standing eligibility is the modern analogue of standing membership: a long-lived relationship between principal and role that a successful credential compromise can exercise.

9.2 Approver collusion

The approval gate is two-phishee resistant only when the requester and approver are independently compromisable. Two-phishee collusion -- the requester and the approver are the same adversary, or two adversaries cooperating -- defeats the workflow at the mechanism layer. The usual mitigations raise the bar: named approvers rather than approver groups (which can be compromised at the group level), CA-gated approval actions, and four-eyes alternatives. None close the class.

9.3 The application-identity gap

This is the article's heaviest limit, and it deserves the most space.

PIM's eligible-active state machine is currently defined over principal in (user | group). Service principals, managed identities, and OAuth consent grants do not flow through PIM activation. Their role assignments are permanent and active by default, and there is no eligible category that applies to them. Microsoft Learn's documentation for Workload ID Premium and Conditional Access for workload identities makes this explicit: ID Protection workload-identity risk detections cover service principals in single-tenant, non-Microsoft SaaS, and multitenant apps, but "Managed Identities aren't currently in scope" [@ms-learn-workload-identity-risk]. Conditional Access for workload identities applies similarly only to service principals owned by the organization, and CA policies "assigned to a group that contains a service principal are not enforced for that service principal" [@ms-learn-ca-workload-identity].

Andy Robbins's three-part Managed Identity Attack Paths series, published June 6-8, 2022 on the SpecterOps blog, is the canonical demonstration of how this gap is exploited [@robbins-mip-part1], [@robbins-mip-part2], [@robbins-mip-part3]. The mechanism is direct. An Azure compute resource -- an Automation Account [@robbins-mip-part1], a Logic App [@robbins-mip-part2], or a Function App [@robbins-mip-part3] -- carries an attached managed identity. The managed identity holds standing role assignments at whatever scope the operator granted, often Owner or Contributor on a subscription.

From inside the resource, any code can fetch an OAuth access token for the managed identity by calling the Azure Instance Metadata Service endpoint at http://169.254.169.254/metadata/identity/oauth2/token. No human in the loop. No MFA challenge. No PIM activation. The audit log records a service-principal token issuance, not an alice-clicked-Activate event.

Managed Identity assignments are an extremely effective security control... But Managed Identities introduce a new problem: they can quickly create identity-based attack paths in Azure that may lead to escalation of privilege opportunities. -- Andy Robbins, *Managed Identity Attack Paths, Part 1: Automation Accounts*, June 6, 2022 [@robbins-mip-part1] An Azure-managed service principal whose credentials are issued and rotated by Azure itself. The underlying Azure resource (a VM, App Service, Function App, Logic App, AKS cluster) retrieves the OAuth access token via the Instance Metadata Service endpoint. Managed identities are not currently in scope for PIM activation; their role assignments are permanent and active [@ms-learn-managed-identities-overview]. The Azure Instance Metadata Service endpoint at `http://169.254.169.254/metadata/identity/oauth2/token`, a link-local non-routable address reachable only from inside the Azure resource itself, that returns an OAuth 2.0 access token for the attached managed identity. The address is the credential: any process running on the resource can fetch the token without storing or presenting any secret. sequenceDiagram autonumber participant Attacker participant FunctionApp as Compromised Function App participant IMDS as IMDS endpoint 169.254.169.254 participant ARM as Azure Resource Manager participant PIMUnused as PIM activation (unused) Attacker->>FunctionApp: Code execution via supply-chain or vuln FunctionApp->>IMDS: GET /metadata/identity/oauth2/token IMDS-->>FunctionApp: OAuth access token for managed identity FunctionApp->>ARM: Action as Owner on subscription ARM-->>FunctionApp: Action succeeds Note over PIMUnused,Attacker: No human, no MFA, no activation, no PIM audit

MITRE ATT&CK maps the class explicitly. T1078.004 -- Valid Accounts: Cloud Accounts cites Robbins's Part 1 as primary reference for the managed-identity case [@mitre-t1078-004]. The page reads: "In Azure environments, adversaries may target Azure Managed Identities, which allow associated Azure resources to request access tokens. By compromising a resource with an attached Managed Identity, such as an Azure VM, adversaries may be able to Steal Application Access Tokens to move laterally across the cloud environment" [@mitre-t1078-004].

T1548.005 -- Temporary Elevated Cloud Access explicitly names PIM as an instance of the JIT-access pattern adversaries abuse: "Many cloud environments allow administrators to grant user or service accounts permission to request just-in-time access to roles... Just-in-time access is a mechanism for granting additional roles to cloud accounts in a granular, temporary manner" [@mitre-t1548-005].

T1548.005 (Temporary Elevated Cloud Access) lists Microsoft's *Approve just-in-time access requests* documentation as citation [1] of the technique, recognizing PIM as a canonical implementation of the JIT-access pattern adversaries abuse [@mitre-t1548-005]. Being named in the ATT&CK framework is, in the security domain, the most explicit acknowledgement an adversary model can give a defensive product.

Note: Three anchors to walk away with: Andy Robbins's June 2022 Managed Identity Attack Paths series [@robbins-mip-part1], [@robbins-mip-part2], [@robbins-mip-part3]; MITRE ATT&CK T1078.004 citing Robbins as primary [@mitre-t1078-004]; the IMDS endpoint at 169.254.169.254 as the technical mechanism [@ms-learn-managed-identities-overview]. If your tenant has any managed identity with Owner or User Access Administrator at a subscription scope, you have an unmediated bypass path around PIM until that role assignment is tightened.

9.4 The assignment-bypass is detective, not preventive

The High-severity assignment-bypass alert documented in §7 is detective by design (see Aha #2). The structural limit it leaves open is that preventive blocking is not the PIM product's default: customers who want it layer a Conditional Access policy on the Microsoft Graph endpoint or an Azure Policy at the management-group scope [@ms-learn-azure-policy], accepting that some legitimate Graph integration may need an exception.

9.5 Customer-owned PIM policy in CSP and Lighthouse scenarios

In the partner-managed case, the customer (not the partner) controls the PIM policy on a delegated authorization [@ms-learn-lighthouse-eligible]. This is the right place to put control, but it is also the place misconfiguration is most common. A customer whose Lighthouse eligible authorization is set with permissive activation policies (no MFA, no approval, large maximum duration) has an unmediated partner activation surface, and the partner cannot tighten the customer-side policy. The MSP-managed case is the operational gotcha most frequently raised at PIM-deployment review boards.

Aha #3: The gap is a data-model problem, not a patchable defect

This is the third aha moment, and it lands differently from the first two.

Key idea: The application-identity gap is not a backlog item. Extending the eligible-active state machine from principal in (user | group) to principal in (user | group | service principal | managed identity | OAuth consent grant) is a data-model extension that would require changes to the role-assignment object schema, the Microsoft Graph role-management endpoints, the PIM evaluation pipeline, the audit-log schema, the Sentinel detection schema, and every downstream IGA tool. The 2024+ Microsoft responses extend some controls to application identities. They do not yet introduce an eligible/active assignment-category type for application principals.

Microsoft has shipped partial responses. Entra Workload ID Premium [@ms-entra-workload-id-product] is a separate three-dollar-per-workload-identity-per-month SKU [@ms-entra-workload-id-product] that unlocks Conditional Access for workload identities [@ms-learn-ca-workload-identity] (with the explicit managed-identity exclusion clause) and ID Protection workload-identity risk detections [@ms-learn-workload-identity-risk]. The PIM page on access reviews documents that "Using Access Reviews for Service Principals requires a Microsoft Entra Workload ID Premium plan in addition to a Microsoft Entra ID P2 or Microsoft Entra ID Governance license" [@ms-learn-pim-access-reviews]. Microsoft's flagship Ignite 2025 announcement was Microsoft Entra Agent ID for AI agents [@ms-entra-ignite-2025]; the announcement is identity for AI workloads, not an eligible-active type extension for service-principal role assignments.

Robbins's class is closed-form within the 2026 PIM architecture. Closing it requires a new architecture, not a patch.

None of these limits is a defect. Each is a deliberate design boundary, and naming them is the academic honesty the topic deserves. The interesting question: where is active research happening, and what would closing the gap actually look like?

10. Open Problems: Where Active Research Is Happening

The five limits in section 9 are settled architectural boundaries. The open problems are different. Each is something nobody has shipped a complete solution to as of 2026, but each has named partial results and named anchors.

10.1 JIT-gating application identities

The data-model extension previewed in section 9's Aha #3 is the largest open problem in this space, and the one Microsoft is responding to most publicly.

What has been tried. Entra Workload ID Premium at three dollars per workload identity per month [@ms-entra-workload-id-product]. Conditional Access for workload identities, which lets the tenant block service-principal sign-ins based on IP range, ID-Protection risk score, or authentication context [@ms-learn-ca-workload-identity]. ID Protection workload-identity risk detections that flag suspicious sign-ins, leaked credentials, and admin-confirmed compromise for service principals [@ms-learn-workload-identity-risk]. Service-principal access reviews, gated behind Workload ID Premium plus Entra ID P2 or Governance [@ms-learn-pim-access-reviews]. Microsoft Entra Agent ID, the flagship Ignite 2025 announcement, brings first-class identity to AI agents [@ms-entra-ignite-2025] -- parallel to, but not the same as, an eligible-active type extension on application role assignments.

An identity used by a software workload to authenticate to other services. In Microsoft Entra ID the term encompasses application objects, service principals, and managed identities [@ms-learn-workload-identities-overview]. As of 2026, workload identities are not in scope of the eligible/active assignment-category model. The 2024+ Workload ID Premium SKU extends sign-in-time controls and risk detection to service principals, but does not yet introduce an eligible category for service-principal role assignments.

What is the conjecture? Closing this gap requires extending the role-assignment object's principal axis to include service principals, managed identities, and OAuth consent grants as first-class subjects of the eligible-active state machine. That extension would require a defined activate() semantics for non-human principals -- itself the hard problem, because the canonical user activation flow assumes an interactive MFA challenge.

Microsoft Learn states the difficulty bluntly: workload identities "can't perform multifactor authentication. Often have no formal lifecycle process. Need to store their credentials or secrets somewhere" [@ms-learn-workload-identities-overview]. The non-interactive case requires either programmatic policy gates (request from this caller, from this IP range, against this entitlement) or a delegation model where a human approver supplies the gate-passing event on the workload's behalf.

10.2 Real-time activation-anomaly blocking

The PIM Alert "Roles are being activated too frequently" is post-hoc. It fires after the activation has already occurred and after the count crosses a threshold. The phished-but-still-authentic activation -- the attacker who supplies a valid MFA, a plausible justification, and a real ticket number -- is observationally indistinguishable from a legitimate emergency activation at the mechanism layer. The only signal that distinguishes them must come from behavioural telemetry.

What has been tried. Microsoft Defender for Cloud Apps ships an out-of-the-box user-and-entity behavioural analytics (UEBA) and machine-learning anomaly-detection layer; the documented policy weighs more than thirty risk indicators across eight risk-factor groups (risky IP, login failures, admin activity, inactive accounts, location, impossible travel, device and user agent, activity rate), with a seven-day initial learning period and a June 2025 transition to a dynamic threat-detection model [@ms-learn-dfca-anomaly]. Microsoft Sentinel UEBA scores anomalies post-event against AuditLogs operations including role-eligibility additions and activations [@ms-learn-sentinel-ueba]. Microsoft Defender for Identity correlates on-premises and cloud sign-in patterns for behavioural-anomaly detection. Neither Sentinel UEBA nor Defender for Cloud Apps is a synchronous gate. Both are detective layers that fire after the activation event has already created consequences.

The academic upper bound for what character-level and LSTM detectors achieve on adjacent tasks comes from Hendler, Kels, and Rubin's 2019 work on AMSI-based detection of malicious PowerShell code, which reports a true-positive rate of nearly 90% at a false-positive rate of less than 0.1% on the PowerShell-misuse classification problem [@arxiv-hendler-1905]. That is the ceiling a probabilistic activation-anomaly classifier could approach. It is not enough to gate synchronously without false-positive operational pain, which is why the deployed surface is post-hoc UEBA scoring rather than pre-commit blocking.

The conjecture. Synchronous gating on behavioural signal at activation time would require Conditional Access (or its successor) to subscribe to an activation-event hook and consume a risk score from ID Protection, Defender for Cloud Apps, or Sentinel UEBA in the few hundred milliseconds before PIM materializes the active assignment. The architectural primitives exist; the synchronous risk-evaluation hook does not yet ship.

10.3 Hybrid-bridge JIT

A single approval workflow spanning the on-premises (MIM PAM / shadow principals) and cloud (Entra PIM) boundaries is not a shipping product. Microsoft has Entra Cloud Sync and Entra Connect for directory synchronization; neither bridges the activation workflow. MIM 2016 is on extended support through January 9, 2029 [@ms-learn-mim-2016]; Microsoft Learn states the path forward is cloud-first PIM with on-prem AD progressively scoped down to the few resources that cannot move [@ms-learn-mim-pam-overview].

MIM 2016 PAM is in extended support, not active development, and Microsoft Learn explicitly states it is "not recommended for new deployments in Internet-connected environments" [@ms-learn-mim-pam-overview]. SP3 ships compatibility updates for SharePoint SE, Exchange SE, and SQL Server 2022 [@ms-learn-mim-2016], but the product line is in maintenance posture. The on-premises half of a hybrid-bridge JIT story requires a different architectural choice than re-investing in MIM.

10.4 Coverage-as-code

How do you evaluate PIM policy coverage in CI/CD for a tenant with two hundred custom Azure roles and fifty directory roles, and gate every PR that touches the role-management policies?

Best partial results. Microsoft Cloud Security Benchmark v3 Privileged Access controls (PA-1, PA-2, ...) give Boolean per-recommendation pass/fail evaluation [@ms-learn-mcsb-v3-pa] -- close, but per-recommendation Boolean rather than composable policy. The PowerShell cmdlets Get-MgPolicyRoleManagementPolicy and Get-MgPolicyRoleManagementPolicyAssignment read role-management policies via Microsoft Graph; the cmdlets ship in the Microsoft.Graph.Identity.SignIns module, despite the Identity Governance branding [@ms-learn-graph-pim-policy-cmdlet].The PIM role-management-policy cmdlets are commonly mis-attributed to the Microsoft.Graph.Identity.Governance PowerShell module because of the Identity Governance branding. They are actually in Microsoft.Graph.Identity.SignIns. The Import-Module line that gets the cmdlets into scope is Import-Module Microsoft.Graph.Identity.SignIns [@ms-learn-graph-pim-policy-cmdlet]. The EntraOps Privileged EAM community project on GitHub, maintained by Thomas Naunheim, demonstrates the "track changes and history of privileged principals and their assignments as code" idiom against the Enterprise Access Model classification [@entraops-github]. Azure Policy itself operates on Azure resource configurations and does not directly evaluate PIM role-management policy state [@ms-learn-azure-policy], which is the data-model gap that drives the GitOps-flavoured drift-detection community pattern.

{` // Take an array of role-management policy assignments // (the kind Get-MgPolicyRoleManagementPolicyAssignment returns) // and assert tenant-wide PIM coverage invariants.

The conjecture. A full coverage-as-code primitive needs Azure Policy (or its successor) to evaluate PIM role-management policy state with the same first-class semantics it applies to Azure resource configuration. That extension would let a tenant declare an invariant -- "every role in the control plane has requires_mfa=true and max_duration_hours <= 1" -- and have the platform enforce it continuously across drift, the way Azure Policy already enforces resource invariants.

10.5 Adaptive-cadence eligibility reviews

Should eligible membership be access-reviewed at higher cadence than active assignments? Eligible membership is standing privilege; active membership is bounded. The argument for adaptive cadence -- reviewing eligibility more frequently when behavioural signals or organizational events suggest the principal may no longer need the role -- is intuitive but mechanically unshipped.

Best partial result. The 2024+ ML-based access-review recommendations [@ms-learn-review-recommendations] -- inactive-user 30-day Deny, user-to-group-affiliation Deny -- are within-cycle reviewer-assist features. They help reviewers decide during a configured access review. They are not cross-cycle adaptive-cadence triggers that fire a new review off-schedule when conditions warrant.

These are research problems. The practitioner does not have the luxury of waiting for them to be solved. What does Monday morning look like for the architect who has read this far and now has to deploy?

11. Practical Guide: Monday Morning for the 2026 Tenant Architect

You have read ten thousand words. You are responsible for a Microsoft 365 tenant that audits against SOX, SOC 2, and ISO 27001. You have a budget for Entra ID P2 (or Entra ID Governance) per privileged user. What do you do on Monday?

Work in this order. The list is ordered by cost-to-impact, with the cheapest, highest-impact items first.

Step 1: Baseline the Tier-0 surface

Every directory role at "Privileged" classification or above should be PIM-eligible-only. The exceptions are the two emergency-access permanent-active Global Administrator accounts (break-glass), which we return to in Step 4.

Activation requires MFA, approval, justification, and ticket number for control-plane and management-plane roles. Maximum activation duration is one hour for Global Administrator and Privileged Role Administrator, and four hours for less-privileged roles. Configure per role per scope; remember that PIM-for-Azure-Resources policies do not inherit.

Import-Module Microsoft.Graph.Identity.Governance
Connect-MgGraph -Scopes 'RoleManagement.Read.Directory','User.Read.All'
$gaRoleId = (Get-MgRoleManagementDirectoryRoleDefinition `
    -Filter "displayName eq 'Global Administrator'").Id
Get-MgRoleManagementDirectoryRoleAssignment `
    -Filter "roleDefinitionId eq '$gaRoleId'" `
    -ExpandProperty Principal |
    Select-Object @{n='User';e={$_.Principal.AdditionalProperties.userPrincipalName}}, RoleDefinitionId

This lists every standing-active Global Administrator in the tenant. Compare against your break-glass roster and your active PIM activations. Anything else is technical debt.

Step 2: Configure access reviews

Quarterly for Tier-0 and control-plane roles. Semi-annually for Tier-1 and management-plane. Annually for Tier-2 and data/workload-plane [@ms-learn-pim-access-reviews]. Turn on the ML-based review recommendations: the 30-day inactive-user Deny recommendation is the reviewer-assist baseline, and the user-to-group-affiliation Deny recommendation helps reviewers spot principals who are organizationally distant from the rest of the group's membership [@ms-learn-review-recommendations].

Step 3: Turn on every PIM Alert and tune the GA-count threshold

Enable all six behavioural PIM Alerts. Tune the "There are too many Global Administrators" alert to a minimum count of two and a percentage of 50% [@ms-learn-pim-alerts]. The expected steady-state count is "fewer than five standing GAs, most of which are break-glass." The High-severity assignment-bypass alert is non-negotiable; route it to a 24x7 SOC queue with an incident-response runbook.Microsoft Secure Score's "Limit the number of Global Administrators" recommendation targets fewer than five standing GAs as the canonical baseline.

Step 4: Break-glass discipline

Two emergency-access permanent-active Global Administrator accounts. Not one, not three.

Note: One break-glass account is a single point of failure: if it is locked, lost, or compromised, the tenant has no emergency entry path. Three or more begin to expand the blast radius unnecessarily. Two balances the two failure modes. FIDO2 hardware keys, stored in physical safes, with continuous sign-in alerting.

Note: Conditional Access policies can lock you out. Break-glass accounts must be excluded from every CA policy that could prevent their sign-in. Compensate with continuous sign-in alerting on every break-glass authentication event; alerts are the substitute for the gate you are deliberately removing.

Step 5: Extend PIM to the four boundaries

PIM-for-Groups: gate ownership of every directory-role-assignable group, every privileged-access security group, and every group that grants management-group-level Azure RBAC. Membership alone is insufficient; ownership is a backdoor to membership.

PIM-for-Azure-Resources: gate Owner, User Access Administrator, and Contributor at the management-group scope, then explicitly at every subscription, every resource group, and every resource where the role is assignable. Inheritance does not flow; configure per scope.

GDAP and Lighthouse: every CSP partner authorization must be eligible, not active. Set the customer-side PIM policy explicitly. Audit annually.

PIM with Conditional Access: attach an authentication-context tag to activation policies on the privileged Entra roles. Add a CA policy that requires a compliant device and a fresh MFA challenge on activation. The activation gate becomes structurally tighter than the sign-in gate, which is the correct ordering for high-privilege actions.

Step 6: Continuous detection

Pipe PIM activation events (via Microsoft Graph audit logs, surfaced in the AuditLogs and MicrosoftGraphActivityLogs Azure Monitor tables) to your SIEM. Cross-correlate with Entra ID Protection sign-in risk and Microsoft Sentinel UEBA anomaly signals [@ms-learn-sentinel-ueba]. KQL templates to write: (a) GA activations outside business hours; (b) activations from non-compliant devices; (c) the assignment-bypass alert correlated with the activating principal's recent sign-in risk score; (d) managed-identity token issuance against subscription-scoped Owner.

Step 7: Mind the application-identity surface

This is the longest-running open item. Inventory every managed identity in the tenant. For each, document the role assignment, the scope, and the resource that holds it.

Apply the "Owner and User Access Administrator at subscription scope is dangerous" rule first; tighten those to Contributor or a custom role wherever possible. Where a managed identity must hold a high-privilege role at a high scope, treat the underlying resource (Function App, Logic App, VM, AKS cluster) as a Tier-0 asset for the purposes of patching, network exposure, and code-review process. Until PIM gates application identities natively, the Tier-0-asset framing is the substitute control.

That is the playbook for the user-principal side of the JIT-admin problem. The application-identity side is still being written. The next iteration of this material will be about the data-model extension that closes Robbins's gap, or the architectural successor that arrives in its place.

12. Frequently Asked Questions and Closing

Three classes of question come up every time this material is taught. The first is conceptual ("what does eligible actually mean?"). The second is operational ("do I need MFA?"). The third is adversarial ("what about managed identities?"). Each appears below.

No. Eligible assignments are permanent in most tenants -- they are the standing relationship between principal and role -- but they grant no privilege until you activate. Only the *active* state is bounded. Your admin rights still exist; they are simply not exercised continuously [@ms-learn-pim-configure]. Only if the role's activation policy is configured to require it. PIM's activation gates -- MFA at activation, approval, justification, ticket number, and activation maximum duration -- are per-role, per-scope flags the tenant sets independently. A role with `requires_mfa=false` and `requires_approval=false` is a valid (if loose) PIM configuration [@ms-learn-pim-change-default-settings]. One hour for the highest-privileged Entra directory roles, including Global Administrator and Privileged Role Administrator. The configurable range is one to twenty-four hours per role per scope [@ms-learn-pim-change-default-settings]. Tighten where you can; the activation cost is small, the standing-active surface saving is large. No. Conditional Access gates the sign-in event. PIM bounds the assignment state. A compromised CA-gated GA still has GA privileges once they sign in -- the gate that mattered (activation) was never traversed. CA and PIM compose; PIM is not a substitute for CA, and CA is not a substitute for PIM. No. PIM alerts via the High-severity "Roles are being assigned outside of Privileged Identity Management" alert when a direct assignment happens [@ms-learn-pim-alerts]. The detection is intentional rather than preventive: blocking direct assignment would break the Microsoft Graph integration surface every legitimate administrative tool uses. Preventive controls -- Conditional Access on the Graph endpoint, Azure Policy at the management-group scope, or entitlement-management workflows -- are added separately based on the tenant's tooling estate. No. PIM's eligible/active state machine is defined over user and group principals. Service principals, managed identities, and OAuth consent grants route around PIM activation entirely. Andy Robbins's June 2022 *Managed Identity Attack Paths* series [@robbins-mip-part1], [@robbins-mip-part2], [@robbins-mip-part3] is the canonical demonstration; MITRE ATT&CK T1078.004 [@mitre-t1078-004] cites Robbins as primary reference. Workload ID Premium plus Conditional Access for workload identities extends sign-in-time controls to service principals (with managed identities still excluded), but does not yet introduce an eligible category for workload-identity role assignments [@ms-learn-ca-workload-identity], [@ms-learn-workload-identity-risk]. Microsoft has shifted the framing to the Enterprise Access Model: control plane, management plane, and data/workload plane [@ms-learn-eam]. The retirement of Tier-0/1/2 is partial; the practitioner community still uses the legacy terms day to day. The underlying principle -- privilege boundaries you do not cross with a single credential -- is preserved across both framings.

Closing

Read the section 1 vignette again. The 2026 tenant where alice@contoso.com is Global Administrator for exactly one hour, with an audit log so complete the SOC 2 auditor signs it without questions, is not a configuration choice. It is the visible behaviour of an identity system whose role-assignment object carries one more field than the 2015 version did. Standing admin did not retire because operators got more disciplined. Standing admin retired because the data model grew a second state.

The forty years between Saltzer and Schroeder's 1975 paper and the 2015 Azure AD PIM Preview were not lost time. UNIX sudo, Kerberos delegation, DACLs, AD groups, MIM PAM, Pass-the-Hash v1 and v2, the Securing Privileged Access roadmap -- each built up the structural understanding that least privilege required a temporal mechanism, not just a static one, and that the temporal mechanism had to live on the assignment object itself, not on the group, the credential, the session, or any indirection through a separate forest. The single new field on the role-assignment object is what those forty years were preparing.

What remains undone is the application-identity boundary. The same role-assignment object Microsoft retrofitted to gate user activation does not yet gate the managed identity attached to a Function App. The IMDS endpoint at 169.254.169.254 is the canonical 2026 bypass path that proves it. Closing that gap, when it comes, will not be a patch to the existing eligible/active state machine. It will be the next chapter -- the one where the state machine learns to apply to a principal that cannot perform an interactive MFA challenge, and the activation semantics are reinvented for the non-interactive case.

The story is not finished. But the first chapter -- the chapter where standing admin became visibly the anti-pattern it had always been -- is.