Skip to content

MongoDB Database Reference

This document provides detailed reference information about SyRF's MongoDB database architecture.

Database Instances

Atlas Clusters

SyRF uses two MongoDB Atlas clusters:

Cluster Environments MCP Server (for querying)
Prod cluster Production mcp__mongodb-syrf-prod__*
Preview cluster Staging + PR previews mcp__mongodb-syrf-preview__*

A local standalone container (mcp__mongodb-syrf-local__*) holds a backup snapshot of the prod cluster taken 2026-02-21. Useful for safe exploration without touching live data.

Environment Connections (Isolated)

Environment Database Cluster
Production syrftest ⚠️ Prod
Staging syrf_staging Preview
PR Preview {n} syrf_pr_{n} Preview
Local development Configurable Local or local container

Warning: The production database is named syrftest despite its name suggesting a test environment. Treat it with care at all times.

Legacy Databases (Prod Cluster)

Database Name Purpose
syrftest PRODUCTION database — all live data
syrfdev Old development snapshot, mostly unused

Collection Architecture

Bounded Context Prefixes

Collections are named using a bounded context prefix derived from the entity's namespace:

// From MongoContext.cs
public static string GetBoundedContextCode(string? namespaceName)
    => namespaceName switch
    {
        null => throw new ArgumentException("AggregateRoot has no namespace"),
        _ when namespaceName.StartsWith("SyRF.ProjectManagement") => "pm",
        _ when namespaceName.StartsWith("SyRF.FileListings") => "as",
        _ when namespaceName.StartsWith("SyRF.LiteratureSearch") => "ls",
        _ => ""
    };

Collection naming formula: {prefix}{EntityClassName}

Project Management Collections (pm prefix)

The project management domain contains the core business entities:

Collection Entity Class Description
pmProject Project Systematic review projects with stages, memberships, questions
pmStudy Study Individual studies with screening, extraction, annotations
pmInvestigator Investigator User accounts and profile information
pmSystematicSearch SystematicSearch Literature search definitions linked to projects
pmDataExportJob DataExportJob Background export job tracking
pmStudyCorrection StudyCorrection PDF metadata correction requests
pmInvestigatorUsage InvestigatorUsage User activity and usage statistics
pmRiskOfBiasAiJob RiskOfBiasAiJob AI-assisted risk of bias analysis jobs

Source code locations:

  • Domain models: src/libs/project-management/SyRF.ProjectManagement.Core/Model/
  • Repositories: src/libs/project-management/SyRF.ProjectManagement.Mongo.Data/Repositories/

Other Bounded Contexts

Prefix Namespace Example Collections
as SyRF.FileListings File attachment storage
ls SyRF.LiteratureSearch Search configuration
(none) Other Uses entity name directly

GUID Representation

CSUUID (C# Legacy) Format

All document IDs in SyRF use CSUUID (C# Legacy GUID) format, which stores GUIDs as BinData subtype 3.

Configuration (from MongoUtils.cs):

public static void EnsureLegacyGuidSerializer()
{
    try
    {
        BsonSerializer.RegisterSerializer(new GuidSerializer(GuidRepresentation.CSharpLegacy));
    }
    catch (BsonSerializationException e)
    {
        if (BsonSerializer.LookupSerializer<Guid>() is not GuidSerializer
            {
                GuidRepresentation: GuidRepresentation.CSharpLegacy
            })
        {
            throw new InvalidOperationException(
                $"The Guid serializer is not the expected representation...", e);
        }
    }
}

Why This Matters

Format BinData Subtype Byte Order Use in SyRF
CSUUID (Legacy) 3 Little-endian first 3 groups Yes - all IDs
Standard UUID 4 Big-endian throughout No

The byte order difference means the same GUID string produces different binary representations:

GUID string: 550e8400-e29b-41d4-a716-446655440000

CSUUID binary:  00 84 0e 55  9b e2  d4 41  a7 16 44 66 55 44 00 00
                [reversed]   [rev]  [rev]  [preserved as-is]

UUID binary:    55 0e 84 00  e2 9b  41 d4  a7 16 44 66 55 44 00 00
                [as-is throughout]

Querying with CSUUID

When using MCP MongoDB tools, mongosh, or Compass:

// ❌ WRONG - Standard UUID won't match existing documents
db.pmStudy.find({ _id: UUID("550e8400-e29b-41d4-a716-446655440000") })

// ✅ CORRECT - Use CSUUID function
db.pmStudy.find({ _id: CSUUID("550e8400-e29b-41d4-a716-446655440000") })

// ✅ CORRECT - Direct BinData with subtype 3
db.pmStudy.find({ _id: BinData(3, "AIQOVZvi1EGnFkRmVUQAAA==") })

Working with GUIDs in Code

Always let the MongoDB C# driver handle serialization:

// ✅ CORRECT - Let the driver serialize
var studyId = Guid.Parse("550e8400-e29b-41d4-a716-446655440000");
var study = await collection.Find(s => s.Id == studyId).FirstOrDefaultAsync();

// ❌ WRONG - Manual binary conversion will likely use wrong byte order
var bytes = studyId.ToByteArray();
var bsonBinary = new BsonBinaryData(bytes, BsonBinarySubType.UuidLegacy);

Driver Connection Behaviour

directConnection=true for single-server local / TestContainers setups

What the code does

When ConnectionStrings:MongoConnection:ClusterAddresses resolves to a single server, both the connection-string builder in MongoConnectionSettings.ConnectionString and the MongoClientSettings builder in MongoContext.BuildSettingsFromClusterAddresses set directConnection=true. Multi-server (real replica set) and SRV (ClusterAddress) paths are untouched.

Why

The local-dev docker-compose.dev.yml runs mongo:7 with --replSet rs0 and initialises the replica set using members:[{_id:0, host:'localhost:27017'}]. That localhost:27017 is the mongo process's internal address inside the container. When the container is published on a different host port (as in side-by-side worktrees, where each worktree gets its own port — e.g. PORT_MONGODB=29585), connecting from the host to localhost:29585 succeeds, but then the MongoDB driver performs SDAM topology discovery: it asks the server for replica-set membership, the server reports localhost:27017, and the driver replaces the configured endpoint with that self-advertised address. The host has nothing on 27017, so every subsequent operation times out with "Connection refused" against Unspecified/localhost:27017.

directConnection=true pins the driver to the configured host:port, skipping SDAM replacement. Change streams still work because the server is genuinely part of a replica set — the flag only affects client-side topology discovery, not server capabilities.

Symptoms if this regresses

Service startup logs timeouts of this shape, even though appsettings.local.json clearly points at the right port:

TimeoutException: ... selecting a server ... Client view of cluster state is
{ ClusterId : "N", Type : "ReplicaSet", State : "Disconnected",
  Servers : [{ EndPoint: "Unspecified/localhost:27017", ... }] }

The give-away is Type : "ReplicaSet" combined with localhost:27017 when you configured a different port — that's the self-advertised address winning over your settings.

Alternative fixes considered

Changing mongo's internal port to match the host-mapped port (as e2e/docker-compose.yml does, with --port 27018 + 27018:27018) also works but requires docker volume reset when the port changes and complicates side-by-side worktrees where each picks a different port. The directConnection flag avoids all of that.

Scope

Tests: MongoContextTests.BuildMongoClientSettings_WithClusterAddresses_* and MongoConnectionSettingsTests.ConnectionString_* pin the behaviour: single server → DirectConnection=true / ?directConnection=true; multi-server → neither.