Skip to content

Fix: BFF session conflates SyRF user GUID with IdP subject

Temporary planning doc — delete from docs/planning/ once the fix has shipped. Why-it-broke history belongs in the PR description; this file just guides the implementation.

Problem

Running locally via scripts/setup-local.sh, the first authenticated API request fails:

System.FormatException: Unrecognized Guid format.
   at System.Guid.Parse(String input)
   at SyRF.SharedKernel.MyUtils.ParseToGuid(String val) in MyUtils.cs:line 53
   at SyRF.WebHostConfig.Common.Infrastructure.CurrentUserService.GetUserId in CurrentUserService.cs:line 27

Every controller that derives from SyrfBaseController calls CurrentUserService.GetUserId(), so this 500s the entire authenticated surface area immediately after login.

Root cause

The BFF session (UserSession) uses a single UserId field for two semantically distinct identifiers:

  1. SyRF investigator GUID (Investigator.Id) — the value the rest of the codebase reads via the https://claims.syrf.org.uk/user_id claim and parses as a Guid.
  2. External IdP subject (Investigator.Auth0Id, OpenIddict sub) — strings like oauth2|google-oauth2|123…, auth0|abc, or arbitrary text from mock OIDC.

Two paths populate the session with the wrong value:

Path A — Dev auth (always wrong)

DevAuthController.cs:77:

UserId = investigator.Auth0Id ?? investigator.Id.ToString(),

Prefers Auth0Id over the SyRF GUID. For any investigator with an Auth0Id (i.e. anyone synced from production), the session's UserId is a non-GUID string.

Path B — OIDC callback (depends on IdP claim shape)

BffAuthController.cs:228-229:

var userId = userInfo.GetValueOrDefault(AuthConstants.ClaimTypes.UserId)
             ?? userInfo.GetValueOrDefault("sub");

Falls back to sub if the https://claims.syrf.org.uk/user_id claim isn't surfaced by the IdP's userinfo endpoint. Mock-oauth2-server (used for local dev) doesn't reliably echo namespaced custom claims back through userinfo, so the fallback fires and userId becomes the IdP subject string.

Where it explodes

SessionAuthenticationHandler.cs:70-76 writes session.UserId into three distinct claims:

new(AuthConstants.ClaimTypes.Subject, session.UserId),          // IdP "sub"
new(AuthConstants.ClaimTypes.UserId, session.UserId),           // SyRF GUID claim
new(AuthConstants.ClaimTypes.ExternalSubjectId, session.UserId),// IdP nameidentifier

All three get the same value. Downstream, CurrentUserService.GetUserId reads the SyRF UserId claim and Guid.Parses it → FormatException.

Fix plan

The fix splits the conflated identifier in UserSession and updates every producer/consumer to populate the correct field.

1. Extend UserSession (UserSession.cs)

Add a separate field for the IdP subject. UserId becomes the SyRF investigator GUID only.

public required string UserId { get; set; }                  // SyRF investigator Guid (string-encoded)
public required string ExternalSubjectId { get; set; }       // IdP "sub" / Auth0Id

Both required → forces every producer to set them deliberately.

2. Fix the producers

DevAuthController.LoginAs (DevAuthController.cs:77)

UserId           = investigator.Id.ToString(),
ExternalSubjectId = investigator.Auth0Id ?? investigator.Id.ToString(),

Auth0Id may be null for never-logged-in investigators; falling back to the GUID for ExternalSubjectId keeps audit trails non-empty without breaking any GUID-typed code path.

BffAuthController.Callback (BffAuthController.cs:225-256)

The callback already has access to IApplicationService (via DI) and calls RecordSignInAsync further down. Resolve the SyRF GUID from the IdP subject:

var externalSubjectId = userInfo.GetValueOrDefault("sub")
                        ?? throw ;  // fail closed — no anonymous sessions

// Look up or provision the investigator for this external subject.
// RecordSignInAsync already creates-on-miss; lift it above session creation
// and have it return the resolved investigator GUID.
var syrfUserId = await _applicationService.ResolveInvestigatorIdAsync(
    externalSubjectId, email, name, picture, ip);

var session = new UserSession
{
    SessionId         = sessionId,
    UserId            = syrfUserId.ToString(),
    ExternalSubjectId = externalSubjectId,
    
};

Two changes implied:

  • IApplicationService gains ResolveInvestigatorIdAsync (or rename/refactor RecordSignInAsync to return the GUID). Today RecordSignInAsync is best-effort and called after the session is built, which is exactly why path B works on prod (Auth0 injects the namespaced user_id claim) but not locally.
  • Failure mode flips from "build broken session" to "401 Unauthorized" if the IdP doesn't return a sub. Acceptable — we cannot construct a SyRF session without one.

Sanity check the prod path

Auth0 production uses an Action that injects https://claims.syrf.org.uk/user_id into id_token and userinfo. The new code path is userInfo[user_id] ?? lookup(sub), so behavior is unchanged for prod when the namespaced claim is present. The lookup is the safety net.

3. Fix the consumer

SessionAuthenticationHandler.BuildClaims (SessionAuthenticationHandler.cs:68-76)

new(AuthConstants.ClaimTypes.Subject,           session.ExternalSubjectId),
new(AuthConstants.ClaimTypes.UserId,            session.UserId),
new(AuthConstants.ClaimTypes.ExternalSubjectId, session.ExternalSubjectId),
new(AuthConstants.ClaimTypes.Email,             session.Email),

Now each claim carries semantically correct data.

4. Tests

  • Unit: DevAuthControllerTests — assert UserId claim parses as Guid for an investigator that has an Auth0Id. (Currently this would have caught the bug — likely the test seeds a GUID-shaped Auth0Id by accident.)
  • Unit: SessionAuthenticationHandlerTests — assert UserId and Subject claims diverge.
  • Integration: BffAuthIntegrationTests — mock OIDC callback where userInfo["sub"] is a non-GUID string and the namespaced claim is absent → session built successfully, GET on a [Authorize] endpoint returns 200.
  • Smoke: run scripts/setup-local.sh, log in via mock OIDC, hit /api/projects → 200.

5. Migration

In-flight sessions stored in ISessionStore (Redis/in-memory) at deploy time will fail deserialization due to the new required ExternalSubjectId. Two options:

  • Drop sessions on deploy (simplest; users re-login). In-memory store already does this on restart; Redis sessions die at next deploy.
  • Add [JsonIgnore] string ExternalSubjectId { get; set; } = "" and only require it on new writes (transitional). Adds churn for negligible UX gain — every session in the store today is an Auth0Id-as-UserId session, i.e. broken anyway.

Going with drop sessions on deploy. Note in PR description.

Out of scope

  • Refactoring MyUtils.ParseToGuid to return Guid? instead of throwing. The throw is correct; the bug is upstream calling it with a non-GUID.
  • Renaming Auth0Id on Investigator (it's not Auth0-specific anymore — see ADR on OpenIddict migration). Tracked separately.
  • Mock-OIDC userinfo response shape — no need to touch e2e/mock-oidc/login.html once the BFF callback resolves via DB lookup.

Acceptance criteria

  • setup-local.sh → log in via mock OIDC → first authenticated GET returns 200 (not 500).
  • setup-local.sh → log in via dev-auth list → first authenticated GET returns 200.
  • All existing auth-related unit + integration tests pass.
  • New tests cover: non-GUID sub, dev-auth with non-GUID Auth0Id, claim divergence in BuildClaims.
  • No regression on Auth0 prod path (verified by integration test with namespaced user_id present).