Fix: BFF session conflates SyRF user GUID with IdP subject¶
Temporary planning doc — delete from
docs/planning/once the fix has shipped. Why-it-broke history belongs in the PR description; this file just guides the implementation.
Problem¶
Running locally via scripts/setup-local.sh, the first authenticated API request fails:
System.FormatException: Unrecognized Guid format.
at System.Guid.Parse(String input)
at SyRF.SharedKernel.MyUtils.ParseToGuid(String val) in MyUtils.cs:line 53
at SyRF.WebHostConfig.Common.Infrastructure.CurrentUserService.GetUserId in CurrentUserService.cs:line 27
Every controller that derives from SyrfBaseController calls CurrentUserService.GetUserId(), so this 500s the entire authenticated surface area immediately after login.
Root cause¶
The BFF session (UserSession) uses a single UserId field for two semantically distinct identifiers:
- SyRF investigator GUID (
Investigator.Id) — the value the rest of the codebase reads via thehttps://claims.syrf.org.uk/user_idclaim and parses as aGuid. - External IdP subject (
Investigator.Auth0Id, OpenIddictsub) — strings likeoauth2|google-oauth2|123…,auth0|abc, or arbitrary text from mock OIDC.
Two paths populate the session with the wrong value:
Path A — Dev auth (always wrong)¶
Prefers Auth0Id over the SyRF GUID. For any investigator with an Auth0Id (i.e. anyone synced from production), the session's UserId is a non-GUID string.
Path B — OIDC callback (depends on IdP claim shape)¶
var userId = userInfo.GetValueOrDefault(AuthConstants.ClaimTypes.UserId)
?? userInfo.GetValueOrDefault("sub");
Falls back to sub if the https://claims.syrf.org.uk/user_id claim isn't surfaced by the IdP's userinfo endpoint. Mock-oauth2-server (used for local dev) doesn't reliably echo namespaced custom claims back through userinfo, so the fallback fires and userId becomes the IdP subject string.
Where it explodes¶
SessionAuthenticationHandler.cs:70-76 writes session.UserId into three distinct claims:
new(AuthConstants.ClaimTypes.Subject, session.UserId), // IdP "sub"
new(AuthConstants.ClaimTypes.UserId, session.UserId), // SyRF GUID claim
new(AuthConstants.ClaimTypes.ExternalSubjectId, session.UserId),// IdP nameidentifier
All three get the same value. Downstream, CurrentUserService.GetUserId reads the SyRF UserId claim and Guid.Parses it → FormatException.
Fix plan¶
The fix splits the conflated identifier in UserSession and updates every producer/consumer to populate the correct field.
1. Extend UserSession (UserSession.cs)¶
Add a separate field for the IdP subject. UserId becomes the SyRF investigator GUID only.
public required string UserId { get; set; } // SyRF investigator Guid (string-encoded)
public required string ExternalSubjectId { get; set; } // IdP "sub" / Auth0Id
Both required → forces every producer to set them deliberately.
2. Fix the producers¶
DevAuthController.LoginAs (DevAuthController.cs:77)¶
UserId = investigator.Id.ToString(),
ExternalSubjectId = investigator.Auth0Id ?? investigator.Id.ToString(),
Auth0Id may be null for never-logged-in investigators; falling back to the GUID for ExternalSubjectId keeps audit trails non-empty without breaking any GUID-typed code path.
BffAuthController.Callback (BffAuthController.cs:225-256)¶
The callback already has access to IApplicationService (via DI) and calls RecordSignInAsync further down. Resolve the SyRF GUID from the IdP subject:
var externalSubjectId = userInfo.GetValueOrDefault("sub")
?? throw …; // fail closed — no anonymous sessions
// Look up or provision the investigator for this external subject.
// RecordSignInAsync already creates-on-miss; lift it above session creation
// and have it return the resolved investigator GUID.
var syrfUserId = await _applicationService.ResolveInvestigatorIdAsync(
externalSubjectId, email, name, picture, ip);
var session = new UserSession
{
SessionId = sessionId,
UserId = syrfUserId.ToString(),
ExternalSubjectId = externalSubjectId,
…
};
Two changes implied:
IApplicationServicegainsResolveInvestigatorIdAsync(or rename/refactorRecordSignInAsyncto return the GUID). TodayRecordSignInAsyncis best-effort and called after the session is built, which is exactly why path B works on prod (Auth0 injects the namespaceduser_idclaim) but not locally.- Failure mode flips from "build broken session" to "401 Unauthorized" if the IdP doesn't return a
sub. Acceptable — we cannot construct a SyRF session without one.
Sanity check the prod path¶
Auth0 production uses an Action that injects https://claims.syrf.org.uk/user_id into id_token and userinfo. The new code path is userInfo[user_id] ?? lookup(sub), so behavior is unchanged for prod when the namespaced claim is present. The lookup is the safety net.
3. Fix the consumer¶
SessionAuthenticationHandler.BuildClaims (SessionAuthenticationHandler.cs:68-76)¶
new(AuthConstants.ClaimTypes.Subject, session.ExternalSubjectId),
new(AuthConstants.ClaimTypes.UserId, session.UserId),
new(AuthConstants.ClaimTypes.ExternalSubjectId, session.ExternalSubjectId),
new(AuthConstants.ClaimTypes.Email, session.Email),
Now each claim carries semantically correct data.
4. Tests¶
- Unit:
DevAuthControllerTests— assertUserIdclaim parses asGuidfor an investigator that has anAuth0Id. (Currently this would have caught the bug — likely the test seeds a GUID-shaped Auth0Id by accident.) - Unit:
SessionAuthenticationHandlerTests— assertUserIdandSubjectclaims diverge. - Integration:
BffAuthIntegrationTests— mock OIDC callback whereuserInfo["sub"]is a non-GUID string and the namespaced claim is absent → session built successfully, GET on a[Authorize]endpoint returns 200. - Smoke: run
scripts/setup-local.sh, log in via mock OIDC, hit/api/projects→ 200.
5. Migration¶
In-flight sessions stored in ISessionStore (Redis/in-memory) at deploy time will fail deserialization due to the new required ExternalSubjectId. Two options:
- Drop sessions on deploy (simplest; users re-login). In-memory store already does this on restart; Redis sessions die at next deploy.
- Add
[JsonIgnore] string ExternalSubjectId { get; set; } = ""and only require it on new writes (transitional). Adds churn for negligible UX gain — every session in the store today is an Auth0Id-as-UserId session, i.e. broken anyway.
Going with drop sessions on deploy. Note in PR description.
Out of scope¶
- Refactoring
MyUtils.ParseToGuidto returnGuid?instead of throwing. The throw is correct; the bug is upstream calling it with a non-GUID. - Renaming
Auth0IdonInvestigator(it's not Auth0-specific anymore — see ADR on OpenIddict migration). Tracked separately. - Mock-OIDC userinfo response shape — no need to touch
e2e/mock-oidc/login.htmlonce the BFF callback resolves via DB lookup.
Acceptance criteria¶
-
setup-local.sh→ log in via mock OIDC → first authenticated GET returns 200 (not 500). -
setup-local.sh→ log in via dev-auth list → first authenticated GET returns 200. - All existing auth-related unit + integration tests pass.
- New tests cover: non-GUID
sub, dev-auth with non-GUIDAuth0Id, claim divergence inBuildClaims. - No regression on Auth0 prod path (verified by integration test with namespaced
user_idpresent).