A common question we hear on Support is “What happens if our Gigalixir app(s) goes over the memory limit (replica size) ceiling?”
For example, what exactly happens when your apps have an unexpected and sudden spike in memory usage, but then it recedes to below the limit in a few minutes?
The quick answer is that when any app exceeds the replica size / memory limit, your Kubernetes pod will get OOMKilled (error: OOMKilled—Container Limit Reached) and restarted.
OOMKilled : OOM is an Out of Memory error. When a system is in danger of running out of available memory, OOM Killer will start killing processes to try to free up memory and prevent a crash.
An OOMKilled error and restart isn’t always instantaneous, but its pretty close to it such that you should consider it likely to happen any time you’ve exceeded your memory limit.
Debugging OOMKilled on Gigalixir
If an OOMKilled occurs with your app or pod, check the application logs to try to figure out why the pod was using more memory than the current max limit setting.
Typical reasons include sudden and unexpected spikes in traffic or a long running Kubernetes job. Each of these situations can cause a pod to exceed the memory limits.
Prevention & Mitigation
If your app memory use for a pod is regularily close to the memory limit, we suggest you scale your replica size appropriately for the upper bound of any expected memory surge. This will prevent a situation where your pod is killed unexpectedly.
If you are still having trouble debugging the issue, you can always contact us at Support.