LogicMonitor: My Virtual SysAdmin

by on August 3, 2010 · 4 comments

I’d recently ordered a new round of servers and was positively dreading having to setup Nagios & Munin on them. This is where the fact that I’m a “born & raised” developer really shines through. The configuration of Nagios is simply beyond me. No matter how much documentation I read, I just can’t get all the pieces moving right. Try to bolt Munin on top of this and I simply walk away in frustration. There had to be a better way…
[Read more…]

Tailoring Your Munin Installation

by on December 21, 2009 · 0 comments

After following Dan’s tutorial on installing munin on your servers, you already get the benefits of munin’s default plugins. You have graphs showing your CPU, RAM, I/O, as well as MySQL, Exim, and quite some other stats. But most of the time you run some additional software which you also want to montior.
[Read more…]

Splunking for Spikes

by on May 10, 2009 · 2 comments

We had a short load spike on our application servers a couple weeks back. A load of 28 on a 4-core machine is more than uncomfortable – it’s downright dangerous. Luckily, it only lasted for 2 minutes and, just as suddenly as it came, vanished again without a trace. Well, that’s not quite true, because this is exactly why we keep those gigabytes of Apache logs lying around!

I knew where the information was, but how to crawl through that >1GB log and find it? A few years back I had done some similar research through JBoss application logs using Splunk. It was a fairly pleasant experience and I wondered if they were still around.

Turns out they are – Splunk is now offering version 3.4.9 for download and it was trivial to setup locally and start crunching data.
[Read more…]

Monitoring tools essentials: Munin vs. Nagios

by on April 16, 2009 · 8 comments

mongrel memory monitoring with nagios-pnp

When you’re running any business critical application, you need to know what’s going on with it. Is it up? Does it cause extended load on your servers? Does it have enough disk space left, how fast is the data on the disk growing, etc.

To know all that, you need a tool which a) monitors and tracks all important performance data like CPU load, memory, disk space, slow queries per second, etc. and b) alerts you if any of the monitored values crosses a defined threshold.

Both Munin and Nagios offer these features. Munin started as a pure monitoring tool for “remembering” data. But it soon learned about alerting, too. Nagios is a very powerful alerting tool, but there are plenty of extensions to make it graph as well. The one I use (and discuss here) is nagios-pnp.
[Read more…]

Securely tunneling Munin traffic

by on March 10, 2009 · 1 comment

This is a guest post by Thomas Eisenbarth. Thomas studied computer science at the University of Augsburg, currently works at BINconsult GmbH, Berlin and co-founded makandra GmbH in Augsburg. He and his teams develop and operate web applications.

As Dan discussed already in Getting a Grip on your Operations with Munin and 10 Seconds A Day With Munin, Munin is a quite nifty tool for monitoring the inner life of your machines.

We’re also using Munin for some time now to monitor our staging and production stuff. There is only one little problem. Most of you guys out there have machines at multiple locations and nameservers which may not be collocated. Most setups that I’ve seen so far do their testing and development at the office while the production sites hopefully run out of a datacenter with reasonable bandwidth, security, etc.
[Read more…]

Give Your Servers Some TLC: 10 Seconds A Day With Munin

by on March 8, 2009 · 0 comments

For some damn reason, your website is slow again. It’s a drag having to pour through logs looking for problems, but if you don’t have any monitoring tools installed you don’t have much choice. Fortunately, there are plenty of tools out there – easy to install, free to use and often make the difference between your boss telling you about a problem or you having already fixed it. Traffic is the lifeblood of your site’s business and if you spend just a few seconds a day having a look at your server’s stats, you can avert critical incidents before they happen. After all, a tweak here today just might be the difference in your site surviving a profitable traffic spike tomorrow.

A couple months ago, I told you about how simple it was to get Munin up and running. Today, I want to show you some of the successes I’ve had in getting our operations under control and how Munin was key in doing so.
[Read more…]

Getting a Grip on your Operations with Munin

by on January 18, 2009 · 5 comments

Have you ever taken a midnight drive down a dirt road without any headlights on? While its certainly a thrilling (and stupid) thing to do, I certainly wouldn’t recommend doing the same thing with your data center. Do you have any idea if the load your servers experienced this morning was unusually high? Could you tell me how many MySQL “slow queries” were executed on your database yesterday? Have customers and co-workers praised you for your website’s performance?
[Read more…]