When you’re running any business critical application, you need to know what’s going on with it. Is it up? Does it cause extended load on your servers? Does it have enough disk space left, how fast is the data on the disk growing, etc.
To know all that, you need a tool which a) monitors and tracks all important performance data like CPU load, memory, disk space, slow queries per second, etc. and b) alerts you if any of the monitored values crosses a defined threshold.
Both Munin and Nagios offer these features. Munin started as a pure monitoring tool for “remembering” data. But it soon learned about alerting, too. Nagios is a very powerful alerting tool, but there are plenty of extensions to make it graph as well. The one I use (and discuss here) is nagios-pnp.
Munin Node and Munin Server
Munin runs a munin-node service on every monitored box, which records the performance data using RRD tool. The munin server connects to the munin-node via TCP port 4949 in order to retrieve the data and raise an alert if anything goes out of bounds. Thomas has described how to securely tunnel Munin traffic over SSH. That’s definitely better than any unsecured remote connection.
Graphing Performance Data With Nagios-PNP or Nagiosgraph
Nagios does not necessarily need any service running on the monitored box. In our setup we let the nagios server connect to the monitored box via ssh, executing the check commands. Those check commands return the service status (OK, WARN, CRITICAL or UNKNOWN) as well as the performance data (at the check time). Nagios-pnp and Nagiosgraph use RRD tool (on the nagios server) to store and graph the retrieved performance data. One very nice feature of nagios-pnp, which I’m missing from munin, is the ability to zoom into any graph to get a more detailed look at a certain event. Very cool!
Munin Plugins and Nagios Plugins
While Munin provides more sophisticated monitoring plugins at MuninExchange (e.g. it measures all imaginable parameters of NFS where nagios merely can tell you: yes, it’s there and has X GB free), Nagios gives you much more flexibility in accessing the monitored hosts and in modeling your network structure. Writing new plugins is easy for both tools.
I have now switched from Munin to Nagios (with nagios-pnp, but you could use nagiosgraph, too) to enjoy the added benefits of greater detailed configuration. What I’m missing, though, is the level of detail provided by the better Munin plugins. Time to make Nagios plugins out of those Munin plugins 😉
Are you using Munin or Nagios with Nagios-PNP or Nagiosgraph for monitoring and alerting? What’s your take on Munin vs Nagios? Let us know in the comments!