MongoDB Database Reference¶
This document provides detailed reference information about SyRF's MongoDB database architecture.
Database Instances¶
Atlas Clusters¶
SyRF uses two MongoDB Atlas clusters:
| Cluster | Environments | MCP Server (for querying) |
|---|---|---|
| Prod cluster | Production | mcp__mongodb-syrf-prod__* |
| Preview cluster | Staging + PR previews | mcp__mongodb-syrf-preview__* |
A local standalone container (
mcp__mongodb-syrf-local__*) holds a backup snapshot of the prod cluster taken 2026-02-21. Useful for safe exploration without touching live data.
Environment Connections (Isolated)¶
| Environment | Database | Cluster |
|---|---|---|
| Production | syrftest ⚠️ |
Prod |
| Staging | syrf_staging |
Preview |
PR Preview {n} |
syrf_pr_{n} |
Preview |
| Local development | Configurable | Local or local container |
Warning: The production database is named
syrftestdespite its name suggesting a test environment. Treat it with care at all times.
Legacy Databases (Prod Cluster)¶
| Database Name | Purpose |
|---|---|
syrftest |
PRODUCTION database — all live data |
syrfdev |
Old development snapshot, mostly unused |
Collection Architecture¶
Bounded Context Prefixes¶
Collections are named using a bounded context prefix derived from the entity's namespace:
// From MongoContext.cs
public static string GetBoundedContextCode(string? namespaceName)
=> namespaceName switch
{
null => throw new ArgumentException("AggregateRoot has no namespace"),
_ when namespaceName.StartsWith("SyRF.ProjectManagement") => "pm",
_ when namespaceName.StartsWith("SyRF.FileListings") => "as",
_ when namespaceName.StartsWith("SyRF.LiteratureSearch") => "ls",
_ => ""
};
Collection naming formula: {prefix}{EntityClassName}
Project Management Collections (pm prefix)¶
The project management domain contains the core business entities:
| Collection | Entity Class | Description |
|---|---|---|
pmProject |
Project |
Systematic review projects with stages, memberships, questions |
pmStudy |
Study |
Individual studies with screening, extraction, annotations |
pmInvestigator |
Investigator |
User accounts and profile information |
pmSystematicSearch |
SystematicSearch |
Literature search definitions linked to projects |
pmDataExportJob |
DataExportJob |
Background export job tracking |
pmStudyCorrection |
StudyCorrection |
PDF metadata correction requests |
pmInvestigatorUsage |
InvestigatorUsage |
User activity and usage statistics |
pmRiskOfBiasAiJob |
RiskOfBiasAiJob |
AI-assisted risk of bias analysis jobs |
Source code locations:
- Domain models:
src/libs/project-management/SyRF.ProjectManagement.Core/Model/ - Repositories:
src/libs/project-management/SyRF.ProjectManagement.Mongo.Data/Repositories/
Other Bounded Contexts¶
| Prefix | Namespace | Example Collections |
|---|---|---|
as |
SyRF.FileListings |
File attachment storage |
ls |
SyRF.LiteratureSearch |
Search configuration |
| (none) | Other | Uses entity name directly |
GUID Representation¶
CSUUID (C# Legacy) Format¶
All document IDs in SyRF use CSUUID (C# Legacy GUID) format, which stores GUIDs as BinData subtype 3.
Configuration (from MongoUtils.cs):
public static void EnsureLegacyGuidSerializer()
{
try
{
BsonSerializer.RegisterSerializer(new GuidSerializer(GuidRepresentation.CSharpLegacy));
}
catch (BsonSerializationException e)
{
if (BsonSerializer.LookupSerializer<Guid>() is not GuidSerializer
{
GuidRepresentation: GuidRepresentation.CSharpLegacy
})
{
throw new InvalidOperationException(
$"The Guid serializer is not the expected representation...", e);
}
}
}
Why This Matters¶
| Format | BinData Subtype | Byte Order | Use in SyRF |
|---|---|---|---|
| CSUUID (Legacy) | 3 | Little-endian first 3 groups | Yes - all IDs |
| Standard UUID | 4 | Big-endian throughout | No |
The byte order difference means the same GUID string produces different binary representations:
GUID string: 550e8400-e29b-41d4-a716-446655440000
CSUUID binary: 00 84 0e 55 9b e2 d4 41 a7 16 44 66 55 44 00 00
[reversed] [rev] [rev] [preserved as-is]
UUID binary: 55 0e 84 00 e2 9b 41 d4 a7 16 44 66 55 44 00 00
[as-is throughout]
Querying with CSUUID¶
When using MCP MongoDB tools, mongosh, or Compass:
// ❌ WRONG - Standard UUID won't match existing documents
db.pmStudy.find({ _id: UUID("550e8400-e29b-41d4-a716-446655440000") })
// ✅ CORRECT - Use CSUUID function
db.pmStudy.find({ _id: CSUUID("550e8400-e29b-41d4-a716-446655440000") })
// ✅ CORRECT - Direct BinData with subtype 3
db.pmStudy.find({ _id: BinData(3, "AIQOVZvi1EGnFkRmVUQAAA==") })
Working with GUIDs in Code¶
Always let the MongoDB C# driver handle serialization:
// ✅ CORRECT - Let the driver serialize
var studyId = Guid.Parse("550e8400-e29b-41d4-a716-446655440000");
var study = await collection.Find(s => s.Id == studyId).FirstOrDefaultAsync();
// ❌ WRONG - Manual binary conversion will likely use wrong byte order
var bytes = studyId.ToByteArray();
var bsonBinary = new BsonBinaryData(bytes, BsonBinarySubType.UuidLegacy);
Driver Connection Behaviour¶
directConnection=true for single-server local / TestContainers setups¶
What the code does
When ConnectionStrings:MongoConnection:ClusterAddresses resolves to a single server, both the connection-string builder in MongoConnectionSettings.ConnectionString and the MongoClientSettings builder in MongoContext.BuildSettingsFromClusterAddresses set directConnection=true. Multi-server (real replica set) and SRV (ClusterAddress) paths are untouched.
Why
The local-dev docker-compose.dev.yml runs mongo:7 with --replSet rs0 and initialises the replica set using members:[{_id:0, host:'localhost:27017'}]. That localhost:27017 is the mongo process's internal address inside the container. When the container is published on a different host port (as in side-by-side worktrees, where each worktree gets its own port — e.g. PORT_MONGODB=29585), connecting from the host to localhost:29585 succeeds, but then the MongoDB driver performs SDAM topology discovery: it asks the server for replica-set membership, the server reports localhost:27017, and the driver replaces the configured endpoint with that self-advertised address. The host has nothing on 27017, so every subsequent operation times out with "Connection refused" against Unspecified/localhost:27017.
directConnection=true pins the driver to the configured host:port, skipping SDAM replacement. Change streams still work because the server is genuinely part of a replica set — the flag only affects client-side topology discovery, not server capabilities.
Symptoms if this regresses
Service startup logs timeouts of this shape, even though appsettings.local.json clearly points at the right port:
TimeoutException: ... selecting a server ... Client view of cluster state is
{ ClusterId : "N", Type : "ReplicaSet", State : "Disconnected",
Servers : [{ EndPoint: "Unspecified/localhost:27017", ... }] }
The give-away is Type : "ReplicaSet" combined with localhost:27017 when you configured a different port — that's the self-advertised address winning over your settings.
Alternative fixes considered
Changing mongo's internal port to match the host-mapped port (as e2e/docker-compose.yml does, with --port 27018 + 27018:27018) also works but requires docker volume reset when the port changes and complicates side-by-side worktrees where each picks a different port. The directConnection flag avoids all of that.
Scope
Tests: MongoContextTests.BuildMongoClientSettings_WithClusterAddresses_* and MongoConnectionSettingsTests.ConnectionString_* pin the behaviour: single server → DirectConnection=true / ?directConnection=true; multi-server → neither.
Related Documentation¶
- MongoContext.cs - Collection naming implementation
- MongoUtils.cs - GUID serialization configuration
- MongoConnectionSettings.cs - Connection string construction