The investigation of a site crash
When I woke up this morning I noticed by site had crashed with the following error:
Error establishing a database connection.
Awesome. What a helpful error message. WordPress is basically saying:
Thanks WordPress. I ssh’d into the server to see if I could figure out what was wrong myself. Since this was complaining about a database connection, my first step was to check the status of mysql:
$ sudo netstat -tap | grep mysql
Nothing. Ruh Roh! Time to attempt a restart…
$ sudo /etc/init.d/mysql restart
… and the site came back up. Phew! That was easy.
But why did it crash in the first place?
Since MySql was the cause it made sense to me to first check the sql logs. I opened those up and found nothing. Based on the recommendation of Google I then searched the syslogs, specifically for memory
:
$ sudo grep memory /var/logs/syslog
Aug 19 10:56:12 localhost kernel: [10664646.817182] [<ffffffff811429d4>] out_of_memory+0x414/0x450
Aug 19 10:56:12 localhost kernel: [10664646.819979] Out of memory: Kill process 4803 (mysqld) score 104 or sacrifice child
Aug 19 10:56:12 localhost kernel: [10664646.831686] [<ffffffff811429d4>] out_of_memory+0x414/0x450
Aug 19 10:56:12 localhost kernel: [10664646.833365] Out of memory: Kill process 4826 (mysqld) score 104 or sacrifice child</ffffffff811429d4></ffffffff811429d4>
Aha! It looks like MySql ran out of memory and the server killed it. Okay, now on to the next question… Why?
Well… it ran out of memory, that’s why. (Duh.) One way to alleviate this is to create a swap file, which it turns out I forgot to do when I originally configured this server. Without that swapfile MySql had nowhere to overflow excess data to and subsequently crashed. I created and enabled a swap file:
$ sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$ sudo mkswap /swapfile
Setting up swapspace version 1, size = 262140 KiB
no label, UUID=XXXXX
$ sudo swapon /swapfile
After creating the file everything has been running great (so far).
Next Steps
Had I not been a narcissist and checked my own website I probably wouldn’t have noticed it was down for hours, perhaps encroaching on days. My next steps are to look into monitoring software – something that alerts me when there’s a problem, or even a potential problem before it’s even there. One I have found that does just that is Nagois, or it’s stepchild Icinga.