Monitor your available memory (memAvailable)

ProTop includes an alertable metric named memAvailable, representing the remaining memory on the server. If you face memory issues (i.e., OOM), create alert definitions like those below to alert you to unexpected memory consumption.

First, add the "OSInfo" data collector name to the ptInitDC variable in your main etc/pt3agent.*.cfg file.  It contains the logic that gathers the memAvailable server metric.

Next, edit your related etc/alert.*.cfg and add lines similar to these:

memAvailable num < 5 "2:2"  "hourly" "&1 &2 &3" alert       # 30%
memAvailable num < 4 "2:2"  "hourly" "&1 &2 &3" alarm # 25%
memAvailable num < 3 "2:2"  "hourly" "&1 &2 &3" page # 20%

Given 16 GB on the system above, you will get an alert at less than 5 GB memAvailable, an alarm when < 4 GB is available, and a page when available memory drops below 20%.

When you get any memAvailable alert, check your dirtymem.sh reports to see which processes consume the most memory and determine if it is expected. If not, take steps to mitigate the memory consumption in the short term by restarting the process (assuming it's not the database).

It is common to regularly recycle specific processes (hourly, daily) while your dev team (or vendor) works on resolving the memory issue in their code.