I use Munin to monitor a few machines, and bubble up alerts when issues show up. It’s pretty good, easy to set up, and has a large number of contributed plugins to monitor pretty much everything. If still out of luck, it’s easy enough to write your own.
To ease the task of viewing the data, each machine runs
munin-node, but only a couple of masters do the data collection with
munin-update. This works reasonably well, except that machines monitored by more than one server need to work extra time to provide the same data to both.
Setting this up is relatively easy, and the benefits show quickly, in the form of a reduced collection time, and fewer gaps in the data.
Surprisingly it also showed as a substantially reduced load on low-power machines. But beware of the
--fork parameter to
Direct remote monitoring
The usual setup (generally as additional files in
/etc/munin/munin-conf.d) for direct monitoring is that a master connects to a slaves, either directly.
[example.net;munin-slave] address munin-slave.example.net
Or via an SSH tunnel, so as not to expose
munin-node to the outside world.
[example.net;munin-slave] address ssh://email@example.com -W localhost:4949
For this second use-case to work, a user, say
munin, needs to be set-up on the slave, that allows the master to log-in with a private key. This is done in
command="/bin/false",no-agent-forwarding,no-pty,no-user-rc,no-X11-forwarding,permitopen="localhost:4949" ssh-ed25519 AAAA... firstname.lastname@example.org
This works nicely (once keys have been distributed and accepted), and the
munin user on the slave doesn’t even need a valid shell.
Asynchronous remote monitoring
To switch to asynchronous monitoring, the first step is to install and/or enable the
munin-asyncd service. The package and command to do so varies depending on distribution and OS, so this is left as an exercise to the reader.
Once done, though, changes are minimal. The slave needs to force
munin-async --spoolfetch to be run when the master connects with its key.
command="/usr/share/munin/munin-async --spoolfetch",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,no-user-rc ssh-ed25519 AAAA... email@example.com
And the master no longer needs tunnelling.
[example.net;munin-slave] address ssh://firstname.lastname@example.org:22
One mildly annoying point is that the
munin user on the slave can no longer have an invalid shell, such as
/bin/false. Otherwise, the SSH connection terminates immediately. Using
/bin/sh instead is sufficient to fix the issue. Provided the user’s
authorized_keys remains simple and limits access, and no password access is allowed, this should however remain sufficiently secure.
This is however sufficient to reduce the processing time of
munin-update on the master, as well as the load on the slaves, which now only need to run the plugins once per period, rather than once per period per master.
Forking updates may not be all the rage
In one confusing case (on OpenBSD), the load on the machine greatly increased when switching to
munin-asyncd. Investigations revealed that this was due to all plugins running at the same time, in parallel, whenever the daemon woke up.
This is due to the startup script using the
--fork option, which forks a new process for each plugin, rather than querying them in sequence. Removing this option (in
/etc/rc.d/munin_asyncd) fixed the issue nicely.