Monday, December 7, 2009

EON monitoring

Situation: setting up an EON-NAS. The install is very stripped down, and as of writing does not offer any monitoring. Therefore we want to set up an automated process which will run on an external server as a chron job, check the status of the NAS, and email us if it is dead or degraded.

Want it to work out of the box, so not using NAPP-IT and wget. Instead let's use SSH to connect to EON NAS and run raw monitoring commands.

Broadly:

* create a locked-down account with limited access that can run monitoring commands
* set up ssh keys to access that account from monitoring server without password
* write a script to do the monitoring and email on state change
* run that script as chron job in monitoring server
** expose our NAS through firewall, set up a persisent hostname using a DHCP-startup script (which should run on NAS-box, right?)



Process:

on EON as root, set up monitor account with strong password

mkdir /monitor
useradd -d /monitor monitor
passwd monitor
chown monitor /monitor

get the ssh functionality set up:

* make a new account. on monitoring machine as root:
useradd fresh
passwd fresh [ENTER twice for ampty password]
su - fresh
mkdir .ssh [you can skip this if .ssh dir already exists]
ssh-keygen -t rsa -f .ssh/eon_key
* set up auto-ssh
ssh monitor@10.0.1.250 mkdir -p .ssh
cat .ssh/eon_key.pub | ssh monitor@10.0.1.250 'cat >> .ssh/authorized_keys'


we should now be able to ssh to EON without password. test it:

ssh -i .ssh/eon_key monitor@10.0.1.250 ls /bin

works. next step: a command on localhost that can monitor zfs. problem: admin account doesn't have permissions to run zpool or zfs. how to set up an account that can check zpool status without having permission to write/delete pool or fs??

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status
pool: mediapool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
mediapool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0

errors: No known data errors

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool destroy mediapool
cannot unshare '/mediapool/media': no permission: unshare(1M) failed
could not destroy 'mediapool': could not unmount datasets

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthy" || echo "NOT HEALTHY"
ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthysfdf" || echo "NOT HEALTHY"
NOT HEALTHY

echo "TEST MAIL" | mail -s "nas problem" notify@gmail.com

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthy" || ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -v | mail -s "nas problem" notify@gmail.com

OKAY, we have a command which will contact EON NAS, check the zfs status, and notify us if anything is wrong. I don't have another local server, so I'm going to monitor from an external server. My local net access is via cable modem, no persistent IP address, so i have to use a dynamic DNS solution.

* freedns.afraid.org, set up a subdomain like "eonstorage.uk.to"
* figure out how to update dyndns when IP address changes.. my router runs dd-wrt which has support for freedns.afraid.org so this is easy
* forward the appropriate port... for security pick a random unused port, eg 62426, and forward it to port 22 of local EON server
* test from 3rd party host: ssh -p 2222 monitor@eostorage.uk.to
* set up chron job on external server

No comments: