Increased latency due to Redis Cluster failure
Incident Report for Minds
Resolved
This incident has been resolved.
Posted Aug 28, 2021 - 16:51 UTC
Update
Latency is now back to normal and Redis is correctly serving cached data again. The Redis cluster died due to corruption of the data log and was not auto-repaired due to persistence being enabled. A patch has been applied to ensure persistence is disabled.
Posted Aug 28, 2021 - 16:50 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 28, 2021 - 16:47 UTC
Update
Persistence should not be enabled on the cluster. We are looking at issues with the helm chart that is erroneously enforcing this.
Posted Aug 28, 2021 - 16:45 UTC
Identified
We are aware that our redis cluster ran out of memory due to a sudden spike in traffic. The site may be running slowly as we restart the cluster.
Posted Aug 28, 2021 - 16:38 UTC
Investigating
We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Aug 28, 2021 - 16:18 UTC
This incident affected: Web, API, and Mobile.