Ya ever have pingdom or some other monitoring site tell you your site is down? Then you ask your host about it, and they say that the site has been online the whole time?

Your hosts are probably telling you the truth. If you are using monitoring that relies on pings to see if the site is up or not, then you may not be getting 100% accurate information. A ping is not a priority to a web server. So if you server is busy, your server is going to serve the web page instead of responding to the ping.

If you load rises high enough, then your pings will start being rejected or timing out. This isn’t saying you are out of the woods. This may mean that your server is not tuned properly or you need to upgrade as you are pushing the limits of the server. I would have your hosting company check the optimizations of your server software. These can often be tweaked to handle more traffic, or limit the traffic to what your server can handle.

Most servers have some type of built in monitoring software that will help to ensure that your site is online. Cpanel servers use tailwatch and chkservd to ensure that all services are running. They will attempt to restart the service if they find it unresponsive. You may have seen these e-mails about clamd or exim failing. These are the most common services I see fail.

On cPanel there is a plugin called munin that you can have installed via WHM. This is a decent statistics program to watch over your server. It has a bunch of graphs that make things fairly easy to read and understand.

 

It is a good idea to monitor the health of your server as downtime can be critical. However if you are not using the right type of software you may be chasing a lot of false positives rather than working on things that need to handled.

 

What kind of monitoring software do you prefer for your server?