On 13th January at 8:00 AM, Atlas experienced widespread sync failures affecting all customers. The issue was caused by a background maintenance operation on the sync audit table that took longer than expected, creating contention within the database.
The process was safely terminated and the affected table optimised, restoring service by 09:08 AM
This incident was separate and technically unrelated to the 12th January incident.
A routine background reindexing operation on the sync audit table exceeded its expected duration, which caused a temporary database lock. This led to delays across all sync operations until the process was stopped and the table cleared of excess data.
Contributing factors:
· High volume of audit data increased maintenance duration
· The maintenance task created contention during a busy operational period
· The audit table required optimisation to reduce processing time
Service was restored through the following steps:
· Immediate termination of the overrunning reindex process.
· Cleared excess data from the audit table to reduce contention.
· Validation and monitoring to ensure normal sync throughput and absence of deadlocks.
During the incident:
· Incident window: 8:00–09:08 AM
· All customers experienced impact due to the central database contention
Clinical Safety Reminder: During any sync interruption, operating from one device helps maintain data integrity and clinical safety in care settings.
Completed/Immediate
· Optimised the audit table to prevent extended maintenance operations
· Adjusted processes to ensure maintenance tasks complete within expected timeframes
Planned/Ongoing
· Ongoing tuning of audit table maintenance and monitoring
· Enhanced visibility of database operations to detect contention sooner
· Longer‑term architectural changes that will separate audit operations from the main database
We apologise for the disruption caused during a critical operational window and appreciate your patience as we resolved the issue and implemented safeguards.