Processing actions fails (layouts changes)

Incident Report for Applicaster

Postmortem

There was a leak between the new cluster (private) that was created.

Sidekiq (bg jobs processing) is connected to the same Redis DB on all clusters.

The new cluster didn’t have the Zapp app running properly (pods) and therefore jobs weren’t processing

Once we found the issue, we downscaled the new private cluster to 0, increased node size on AWS console, and restarted the pods on the prod-us1 cluster (the running production cluster).

In order to prevent it in the future, we need to make sure there is no running nodes on redundant clusters, only the active one.

In addition, we should consider Redis DB separation between clusters, although this could cause losing some of the running processes.

Posted Nov 17, 2023 - 15:12 UTC

Resolved

This incident has been resolved.

Posted Nov 15, 2023 - 16:16 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Nov 15, 2023 - 15:07 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Nov 15, 2023 - 14:55 UTC

Investigating

We are currently investigating this issue.

Posted Nov 15, 2023 - 14:23 UTC

This incident affected: Zapp (Studio).