Atlas Database Issues

Incident Report for Person Centred Software

Postmortem

Atlas Sync Maintenance Process Incident on 13th January 2026

Summary

On 13th January at 8:00 AM, Atlas experienced widespread sync failures affecting all customers. The issue was caused by a background maintenance operation on the sync audit table that took longer than expected, creating contention within the database.

The process was safely terminated and the affected table optimised, restoring service by 09:08 AM

This incident was separate and technically unrelated to the 12th January incident.

Root Cause

A routine background reindexing operation on the sync audit table exceeded its expected duration, which caused a temporary database lock. This led to delays across all sync operations until the process was stopped and the table cleared of excess data.

Contributing factors:

·         High volume of audit data increased maintenance duration

·         The maintenance task created contention during a busy operational period

·         The audit table required optimisation to reduce processing time

Timeline of Events

Resolution

Service was restored through the following steps:

·         Immediate termination of the overrunning reindex process.

·         Cleared excess data from the audit table to reduce contention.

·         Validation and monitoring to ensure normal sync throughput and absence of deadlocks.

Customer Communication

During the incident:

·         Incident window: 8:00–09:08 AM

·         All customers experienced impact due to the central database contention

Clinical Safety Reminder: During any sync interruption, operating from one device helps maintain data integrity and clinical safety in care settings.

Preventative Measures and Next Steps

Completed/Immediate

·         Optimised the audit table to prevent extended maintenance operations

·         Adjusted processes to ensure maintenance tasks complete within expected timeframes

Planned/Ongoing

·         Ongoing tuning of audit table maintenance and monitoring

·         Enhanced visibility of database operations to detect contention sooner

·         Longer‑term architectural changes that will separate audit operations from the main database

We apologise for the disruption caused during a critical operational window and appreciate your patience as we resolved the issue and implemented safeguards.

Posted Jan 28, 2026 - 16:17 GMT

Resolved

This incident has been resolved.
Posted Jan 13, 2026 - 10:40 GMT

Update

We are continuing to monitor for any further issues. If you are still experiencing issues syncing devices, please swap the Sync Pod and retry
Posted Jan 13, 2026 - 08:41 GMT

Monitoring

A fix has been implemented and we are monitoring the results. If you are still experiencing issues syncing devices, please swap the Sync Pod and retry
Posted Jan 13, 2026 - 08:37 GMT

Identified

The issue has been identified and a fix is being implemented.
Posted Jan 13, 2026 - 08:31 GMT

Update

We are continuing to investigate this issue.
Posted Jan 13, 2026 - 08:20 GMT

Update

We are continuing to investigate this issue.
Posted Jan 13, 2026 - 08:17 GMT

Investigating

We are currently investigating this issue.
Posted Jan 13, 2026 - 08:16 GMT
This incident affected: eMar (Atlas Central, CAPA, CAPA inbound prescription service, Atlas Sync, eMAR App, Titan Integration, Scorecard).