Saturday, December 12, 2009

EON NAS software setup

zpool create -f mediapool raidz2 c0t0d0 c0t1d0 c2t2d0 c2t3d0 c3t0d0 c3t1d0 c3t2d0 c3t2d0 c3t4d0
zfs create -o casesensitivity=mixed -o nbmand=on -o sharesmb=guestok=true -o sharenfs=ro mediapool/media
zfs set sharesmb=name=media mediapool/media
groupadd -g 600 media
useradd -u 502 -g 600 media
passwd media
chown -R media:media /mediapool/media
chmod 775 /mediapool/media


http://sites.google.com/site/eonstorage/faq -> How do I start NFS server services

cd /var/svc/manifest/network
svccfg -v import rpc/bind.xml
svccfg -v import nfs/status.xml
svccfg -v import nfs/nlockmgr.xml
svccfg -v import nfs/server.xml
svcadm enable rpc/bind
svcadm enable nfs/status
svcadm enable nfs/nlockmgr
svcadm enable nfs/server

zfs set aclinherit=passthrough mediapool/media
zfs set aclmode=passthrough mediapool/media

chmod A=owner@:rwxpdDaARWcCos:fd-----:allow,group@:rwxpdDaARWcCos:fd-----:allow,everyone@:wpdDAWc:fd-----:deny,everyone@:rxaRCos:fd-----:allow /mediapool/media

ls -ldV /mediapool/media
drwxrwxr-x+ 2 media media 8 Dec 6 09:48 /mediapool/media
owner@:rwxpdDaARWcCos:fd-----:allow
group@:rwxpdDaARWcCos:fd-----:allow
everyone@:-w-pdD-A-Wc---:fd-----:deny
everyone@:r-x---a-R--Cos:fd-----:allow


updimg.sh /mnt/eon0/boot/x86.eon
reboot

zpool status
no pools available

edit /mnt/eon0/.exec and uncomment the "zpool import" line
updimg.sh /mnt/eon0/boot/x86.eon
reboot

zpool status: lists zpool
windows -> start -> run -> \\10.0.1.50\
works!
try to copy some files: fails
up one dir, right click on media, "map network drive", "connect as other user", media/mediapass
write allowed!
mac -> Finder -> apple-K -> nfs://10.0.1.50/mediapool/media
mounts! file read allowed! file write not allowed!

get "PCI CF to SSD SATA" device working under OpenSolaris

"PCI to 4x Compact Flash Card (CD to SSD SATA adapter)"
box says "Creative I/O" but retailed as Syba SY-PCI48001

shows up as "Silicon Image Sil 0680 Ultra-133 Medley ATA Raid Controller"

because it lists itself as device type RAID, the ata driver does not attach. online research suggests that devices based on 0680 which declare themselves as RAID can be made to work.

Boot OS snv_125 DVD with all controllers and disks attached. Double-click "Install OpenSolaris" icon on desktop and install to a spare HD. When install completes, reboot.

Log in, start terminal, su -

update_drv -v -a -i '"pci1095,3680"' ata
exit status = 0
reboot -- -reconfigure

log in, start terminal, su -
prtconf: device still has no driver attached
no sign of it in /var/adm/messages
/etc/driver_aliases lists it



reading online, it might not work unless i flash the BIOS to a non-raid version. i can find non-raid bios for 0680a on silicon image website, but running their windows bios update tool, it could see my 3124 card but not the 0680. http://club.myce.com/f61/new-silicon-image-sil-0680-firmware-drivers-192683/ includes a case where this chip could only be updated via dos updating tool, but was successful with the DOS tool. thing is, it depends which flash chip is on the board.... some might come with a write-once chip that cannot be flashed! however it is not possible to read the markings on the chip on my board, and i can't find out by research, so I need to try the DOS tool.

new sub-sub-project: make a bootable DOS system with the BIOS and tool included. http://genetikayos.livejournal.com/43998.html has instructions. download freeDOS floppy img from http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/unofficial/balder/ and winimage and usb format tool as detailed in above link. put those on a USB key, boot from "hiren boot cd" mini winxp, install winimage, use winimage to extract balder img file to a new directory. try to run HP usb format tool, but needs LZ32.dll which isn't in minixp... nor is it on my real winxp system?? instead use dell diagnostics tool reference in comments at above link... after creating dell diagnostic disk, unplug and insert it so it is mounted by windows. add UPDFLASH.EXE and bios .bin file from silicon image site. rename gui.exe and all .bat files to disable dell diagnostic tools and ensure we get dumped to a DOS prompt on boot.

bios won't boot this flash drive formatted this way, unless i select it in the "hdd priority list".

boots to DOS prompt, ".\UPDFLASH.EXE b3400.bin"
... finds my controller, "Loading BIOS..." "Verifying..." "BIOS is loaded."
success!!

reboot and check BIOS POST data about 680 card: it now shows up as Class 0106, "Mass Storage Controller" yay!

boot from EON disc: format lists it!

"install.sh" [select the option for c1d0 which was the CF card]
success!
"reboot"
log in with root/eonsolaris
/usr/bin/setup [configure hostname and IP addresses for each network interface]
updimg.sh /mnt/eon0/boot/x86.eon

plug in all drives, make sure BIOS is set to boot CF card 1st priority, and continue with final software setup

NAS build new attempt

all cards including Addonics SATA and 0680 IDE->CF installed.

old CD-ROM drive and scratch disk drive attached. boot from EON install disk. immediately install to scratch HD with no config:

log in with root/eonsolaris
"format" then ^C to list disks, note disk id of scratch HDD
"/usr/bin/install.sh" and select c0d0 per format output

reboot, make sure BIOS is set to boot from scratch HDD

update_drv -v -a -i '"pci1095,3680"' ata
exit status = 0

okay, that looks solid. it is listed in driver_aliases, but not format. i also see it listed in /etc/path_to_inst however it seems path_to_inst should not be backed up or manually manipulated?

normal system, would now "reconfigure boot" but as EON state only persists after running updimg, this seems pointless. try updimg, which will now preserver driver_aliases, and see if driver is attached after boot.

updimg.sh /mnt/eon0/boot/x86.eon
reboot

rebooting, get grub menu with list of boot options, but booting from first option now hangs... looks like updimg.sh broke something. reboot and select OEM from boot menu. without spending any time on config, do a test updimg and pay close attention to the output for any error messages. actuall, am worried that updimg.sh will build on x86.eon which already failed to boot, so first cp x86.eon.oem x86.eon and remove any .0 backup. now run updimg.sh /mnt/eon0/boot/x86.eon completes with OK

reboot

reboot works.

looking at /mnt/eon0/boot/.backup confirms that driver_aliases *is* being backed up (thought this would have been lost during x86.eon rollback? guess not because it lives in on-disk storage outside of x86.eon, as grub does) looking at driver_aliases, does not include 680.

update_drv -v -a -i '"pci1095,3680"' ata
exit status = 0

updimg.sh completes with OK
reboot

this time it gets past the logo screen and reboots OK... guess the previous failure was a one-off. /etc/driver_aliases includes the line for the 680 and so does /etc/path_to_inst. however it is not listed by "format" prtconf still shows it as "driver not attached"


NO GO: before asking andre for help, let's work on getting the CF card visible to standard OS snv_125

Thursday, December 10, 2009

Final NAS hardware setup, EON install

Have 2 PCI cards: sil3124 addonics 4xSATA card, sil0680 syba 4xCF->SATA card. intend to use CF card as boot drive, to keep all drive bays free for mass storage (and USB boot does not work with Solaris formatting on my motherboard's BIOS)

Problem: with only sil3124 plugged in PCI2, BIOS pops up 3124 drive detection, and those drives show up in BIOS drive listing. Plugging sil0680 into PCI1, BIOS does an 0680 drive detection, which takes about 30 seconds (!) but does not detect 3124 drives. nevertheless, booting off of EON CD, once booted the 3124 drives show up to 'format' so this does not appear to be a blocking issue.

Problem: OpenSolaris supports sil0680, but this driver is not on the EON install CD, therefore my CF card does not show up to the "format" or "install.sh" commands. ouch. So I have to either add the driver to the running system, or build my own install CD. The former would obviously be the quicker option, if it is possible.

relevant links:
http://eonstorage.blogspot.com/2009/02/adding-your-own-drivers-to-eon.html
http://eonstorage.blogspot.com/2009/02/another-way-to-add-drivers-to-eon.html
http://eonstorage.blogspot.com/2009/05/eon-zfs-nas-meets-ips-packages.html

let's try "another way" method from 2nd link above... should allow us to add driver to running system, which will then detect the CF card, then we can install with CF card support. failing that, we can install to a legacy IDE drive, patch up the drivers there, reboot with sil0680 support, and install to CF.

from http://genunix.org/ find the appropriate link to OS release which matches EON release: http://www.genunix.org/distributions/indiana/osol-1002-125-x86.iso download it, opening the iso image is no problem but all the drivers seem to be bound up in a .zlib file

...

after some searching, it seems there is no separate 0680 driver? don't find it at http://pkg.opensolaris.org/ or in files on OS full install CD, even after booting it.

on booted OS snv_125 liveCD system, look for any sign of my device:

% prtconf
...
pci8068,244e, instance #0
pci1095,3680 (driver not attached)

there it is! numbers match what i saw in BIOS POST. it is device class RAID, but so is the addonics card... generic ata driver should be enough. actually, the addonics card is using si3124 driver, which support RAID features... but generic ATA should still be enough?

"prtconf -D" shows sd disks hanging from devices using "ahci" "ata" and "sil3124" cards... "ata" sounds like a good possibility. how to attach the driver to our device?

become root with "su -" passwd "opensolaris"

/usr/X11R6/bin/scanpci also shows it:

pci bus 0x0006 cardnum 0x00 function 0x00: vendor 0x1095 device 0x0680
Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller

from http://www.timelordz.com/wiki/index.php/OpenSolaris_Indiana_2008.11_Acer_Aspire_One_Install#Attaching_the_Driver we should attach the driver by removing it and adding it with correct args... however the driver is already in use for the onboard ata. how to handle this? one option would be to change BIOS settings for onboard to run in AHCI mode. another option is to find syntax for loading attaching one driver to multiple devices.

% add_drv -i "pci1095,680" ata
("ata") already in use as a driver or alias

so do need to remove it first... therefore need list of every device it applies to it?

here is a supposed method to attach device to installed driver:

update_drv -a -i 'pci1095,680' ata

executes with no output... "format" shows the same list of disks. nothing in dmesg.
oops, command used wrong pci spec:

update_drv -a -i 'pci1095,3680' ata

still no output, no new disk under "format", prtconf shows 'driver not attached'

'man update_drv' says it will take effect after 'reconfig boot or hotplug of the device'

update_drv -v -a -i '"pci1095,3680"' ata

no dice. however /etc/driver_aliases does list the additions... i guess reboot is required.

lots of similar troubleshooting:
http://forums.sun.com/thread.jspa?threadID=5088663
http://bugs.opensolaris.org/view_bug.do?bug_id=6595150
http://hub.opensolaris.org/bin/view/Community+Group+advocacy/intro-solaris-drivers
same chip:
http://mail.opensolaris.org/pipermail/driver-discuss/2006-June/000340.html

proof this chip can work:
http://defect.opensolaris.org/bz/show_bug.cgi?id=9349

if reboot is required, then i need to first install onto one of my mass storage disks, then update aliases, reboot/reconfigure, and only then can i install onto CF card. so, reboot with EON CD.

Monday, December 7, 2009

EON monitoring

Situation: setting up an EON-NAS. The install is very stripped down, and as of writing does not offer any monitoring. Therefore we want to set up an automated process which will run on an external server as a chron job, check the status of the NAS, and email us if it is dead or degraded.

Want it to work out of the box, so not using NAPP-IT and wget. Instead let's use SSH to connect to EON NAS and run raw monitoring commands.

Broadly:

* create a locked-down account with limited access that can run monitoring commands
* set up ssh keys to access that account from monitoring server without password
* write a script to do the monitoring and email on state change
* run that script as chron job in monitoring server
** expose our NAS through firewall, set up a persisent hostname using a DHCP-startup script (which should run on NAS-box, right?)



Process:

on EON as root, set up monitor account with strong password

mkdir /monitor
useradd -d /monitor monitor
passwd monitor
chown monitor /monitor

get the ssh functionality set up:

* make a new account. on monitoring machine as root:
useradd fresh
passwd fresh [ENTER twice for ampty password]
su - fresh
mkdir .ssh [you can skip this if .ssh dir already exists]
ssh-keygen -t rsa -f .ssh/eon_key
* set up auto-ssh
ssh monitor@10.0.1.250 mkdir -p .ssh
cat .ssh/eon_key.pub | ssh monitor@10.0.1.250 'cat >> .ssh/authorized_keys'


we should now be able to ssh to EON without password. test it:

ssh -i .ssh/eon_key monitor@10.0.1.250 ls /bin

works. next step: a command on localhost that can monitor zfs. problem: admin account doesn't have permissions to run zpool or zfs. how to set up an account that can check zpool status without having permission to write/delete pool or fs??

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status
pool: mediapool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
mediapool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0

errors: No known data errors

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool destroy mediapool
cannot unshare '/mediapool/media': no permission: unshare(1M) failed
could not destroy 'mediapool': could not unmount datasets

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthy" || echo "NOT HEALTHY"
ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthysfdf" || echo "NOT HEALTHY"
NOT HEALTHY

echo "TEST MAIL" | mail -s "nas problem" notify@gmail.com

ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -x | grep "all pools are healthy" || ssh -i .ssh/eon_key monitor@10.0.1.250 /usr/sbin/zpool status -v | mail -s "nas problem" notify@gmail.com

OKAY, we have a command which will contact EON NAS, check the zfs status, and notify us if anything is wrong. I don't have another local server, so I'm going to monitor from an external server. My local net access is via cable modem, no persistent IP address, so i have to use a dynamic DNS solution.

* freedns.afraid.org, set up a subdomain like "eonstorage.uk.to"
* figure out how to update dyndns when IP address changes.. my router runs dd-wrt which has support for freedns.afraid.org so this is easy
* forward the appropriate port... for security pick a random unused port, eg 62426, and forward it to port 22 of local EON server
* test from 3rd party host: ssh -p 2222 monitor@eostorage.uk.to
* set up chron job on external server

Thursday, December 3, 2009

Troubleshooting OpenSolaris USB Boot

My BIOS will hang if a bootable OS USB drive is present during POST - before mem test if present at boot, or at whatever moment it is inserted.

GParted output for unknown OS config:

/dev/sda
unallocated 2MB
/dev/sda1 992MB unknownFS BOOT
diskLabelType: msdos
Heads: 255
Sectors/Track: 63
Cylinders: 126

fdisk output for same:
Disk ID: 0x000000000
/dev/sda1 BOOT Id=bf System=Solaris
Partition 1 has different physical/logical beginnings (non-Linux?):
phys=(1023, 254, 63) logical=(0, 65, 2)
Partition 1 has different physical/logical endings:
phys=(1023, 254, 63) logical=(126, 182, 56)



Now write USB key with FreeNAS Embedded using m0n0wall procedure (as root):

gunzip -c /home/geoff/Desktop/FreeNAS-amd64-embedded-0.7.4919.img| dd of=/dev/sda bs=16k


boot from the USB key to verify that it was correct: YES, FreeBSD starts to load


GParted output:

/dev/sda
unallocated 988MB
diskLabelType: unrecognized
Heads: 255
Sectors/Track: 63
Cylinders: 126


fdisk output:

This disk has both DOS and BSD magic
Give the 'b' command to go to BSD mode.

disk id: 0x90909090
/dev/sda4 BOOT id=a5 System=FreeBSD
Partition 4 has different physical/logical endings:
phys=(1023, 254, 63) logical=(3, 28, 41)

[give 'b' command]

Partition /dev/sda4 has invalid starting sector 0.