The investigation of a site crash
When I woke up this morning I noticed by site had crashed with the following error:
Error establishing a database connection.
Awesome. What a helpful error message. WordPress is basically saying:
Thanks WordPress. I ssh’d into the server to see if I could figure out what was wrong myself. Since this was complaining about a database connection, my first step was to check the status of mysql:
$ sudo netstat -tap | grep mysql
Nothing. Ruh Roh! Time to attempt a restart…
$ sudo /etc/init.d/mysql restart
… and the site came back up. Phew! That was easy.
But why did it crash in the first place?
Since MySql was the cause it made sense to me to first check the sql logs. I opened those up and found nothing. Based on the recommendation of Google I then searched the syslogs, specifically for
$ sudo grep memory /var/logs/syslog Aug 19 10:56:12 localhost kernel: [10664646.817182] [<ffffffff811429d4>] out_of_memory+0x414/0x450 Aug 19 10:56:12 localhost kernel: [10664646.819979] Out of memory: Kill process 4803 (mysqld) score 104 or sacrifice child Aug 19 10:56:12 localhost kernel: [10664646.831686] [<ffffffff811429d4>] out_of_memory+0x414/0x450 Aug 19 10:56:12 localhost kernel: [10664646.833365] Out of memory: Kill process 4826 (mysqld) score 104 or sacrifice child</ffffffff811429d4></ffffffff811429d4>
Aha! It looks like MySql ran out of memory and the server killed it. Okay, now on to the next question… Why?
Well… it ran out of memory, that’s why. (Duh.) One way to alleviate this is to create a swap file, which it turns out I forgot to do when I originally configured this server. Without that swapfile MySql had nowhere to overflow excess data to and subsequently crashed. I created and enabled a swap file:
$ sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k $ sudo mkswap /swapfile Setting up swapspace version 1, size = 262140 KiB no label, UUID=XXXXX $ sudo swapon /swapfile
After creating the file everything has been running great (so far).
Had I not been a narcissist and checked my own website I probably wouldn’t have noticed it was down for hours, perhaps encroaching on days. My next steps are to look into monitoring software – something that alerts me when there’s a problem, or even a potential problem before it’s even there. One I have found that does just that is Nagois, or it’s stepchild Icinga.