Friday, November 20, 2009

FreeNAS, ZFS

Need an upgraded storage solution. Old ReadyNAS is still alive, but it sometimes requires reboot by unplug, and I can't handle the emotional trauma. Drobo is working, but previously the first two drives had episodes of spontaneously dropping and reappearing, causing disk thrashing as data was redistributed each time. WinXP is limit to 2TB max partition size, so all my data is ghettoized and I spend a lot of time copying things between partitions. Not going to invest in their overpriced 8-bay unit.

Next up: FreeNAS with ZFS support. ZFS is self-checking, self-healing, and its copy-on-write architecture means that data won't be lost if power is lost mid-write (every write either succeeds or fails in its entirety). It is pretty much the last word in filesystems datawise, although it is not yet flexible enough to meet the demands of the home user (it cannot freely scale up/down or maximize use of mismatched drives without jumping through hoops)

Any hardware should be compatible with OpenSolaris (ZFS's native home) and FreeBSD with a bonus for explicit FreeNAS support.
OpenSolaris
FreeBSD
FreeNAS

Research:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
http://harryd71.blogspot.com/2008/10/tuning-freenas-zfs.html
http://wiki.freebsd.org/ZFSTuningGuide
http://wiki.freebsd.org/ZFS
http://techpad.co.uk/content.php?sid=60 (is it true that healing happens only in *mirrored* zfs?)
http://forums.smallnetbuilder.com/showthread.php?t=1953
http://pegolon.wordpress.com/2009/01/13/build-your-own-drobo-replacement-based-on-zfs/
http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z
http://www.mouldy.org/what-i-learned-from-setting-up-zfs-on-my-fileserver
http://www.learnfreenas.com/blog/2009/04/12/ramblings-on-freenas-zfs-expandability-and-raid-5/
http://nowhereman999.wordpress.com/2009/04/19/zfs-freenas-a-poor-and-very-geeky-man%E2%80%99s-drobo-setup/
http://rskjetlein.blogspot.com/2009/08/expanding-zfs-pool.html
http://wiki.mattrude.com/index.php?title=Freenas/ZFS_and_FreeNAS_expansion
http://ask.metafilter.com/125509/FreeNAS-Hardware-Specs

Parts:

Need: 8 bays minimum for hard drives
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=2010090007+1054808291+1309321151&QksAutoSuggestion=&ShowDeactivatedMark=False&Configurator=&Subcategory=7&description=&Ntk=&CFG=&SpeTabStoreType=&srchInDesc=

possibly convert 5.25 into 3.5 bays
Thermaltake A2309 iCage 3-in-3 with fan $20
Silverstone 4-in-3 passive $30
Cooler Master 4-in-3
Chenbro 5-in-3 hotswap
http://www.enhance-tech.com/products/multidrive/New_StorPack.htm

4 new 2tb drives to get it started
http://www.newegg.com/Product/Product.aspx?Item=N82E16822145276

motherboard: don't need a lot of power, the lowest 64-bit system available would be fine. 2GB ram, as many SATA ports as possible. Need 1 or 2 old PCI slots to handle SATA card below.
http://www.newegg.com/Product/Product.aspx?Item=N82E16813130240R

SATA controller card
SUPERMICRO AOC-SAT2-MV8 64-bit PCI-X133MHz SATA Controller Card compatible with opensolaris according to http://ask.metafilter.com/125509/FreeNAS-Hardware-Specs and https://opensolaris.org/jive/thread.jspa?messageID=435458 and also freenas






figure out where to run OS and keep zlog (and research complaints of lost data when zlog is lost)


DATA








whatrawused
Drobo8TB5TB
ReadyNAS4TB3TB
PC2TB2TB
Spare2TB0TB
USB1TB0TB
total17TB10TB







FreeNAS Primary16TB8/12TB
PC2TB2TB
Spare6TB0TB
USB1TB0TB

/
/
ZFS important info:
/
ZFS toplevel unit is pool. pools are built on vdevs. a vdev is one or more drives / partitions / files.

You can grow a pool by adding a new vdev or by increasing the size of a vdev. The size of a redundant (mirrored or raidz'ed) vdev can be grown by swapping out one disk at a time, giving zfs a chance to recalculate the parity for each drive (known as "resilvering") to re-establish redundancy before swapping the next drive out. Unless the vdev is double- or triple-parity (known as raidz2 and raidz3 (pending)) then your data is at risk during the resilver process, should one of the other drives die (TODO: could removed drive by swapped back in should that happen?)

ZFS redundancy against hardware failures is accomplished at the vdev level, by making each vdev redundant via mirroring or varying degrees of parity. ZFS will warn you if you attempt to mix different types of vdevs in a pool, because it is not natural to span data across different levels of redundancy. Because a pool stripes data across the vdevs it comprises, if one vdev fails the entire pool's data is lost. Therefore a pool made of vdevs of varying types is only as reliable as the least reliable vdev. Adding a nonredundant vdev, eg a single drive, to a pool makes the entire pool subject to data loss should that drive die.
/
If you have 8 drives of size P there are various strategies for organizing your filesystem:
/
4 vdevs of 2 mirrored drives, all 4 vdevs in a single pool. This gives 4*P space. The filesystem can survive failure of any one drive. The filesystem can survive failure of up to 4 drives if one drive of each mirrored pair dies. However if 2 drives of the same pair die, all 4TB of data could be lost. Odds are 1/7. Space efficiency is 50%

If it is acceptable for the storage to be divided into chunks, 4 vdevs, each made of a 2-drive mirror, each making up their own pool, will result in 4 P-sized pools for a total of 4*P available space. Maximum damage from a 2-drive failure would be loss of the data on the vdev which had both drives die. Data on the other 3 vdevs would be safe and protected. Note that it is the administrator's job to make sure data in any one pool never exceeds size of P, and moving data between pools is slow. Also note that the performance is slightly lower; data reads are striped across the 2 disks as opposed to all 8. However if you have a bottleneck elsewhere (network) this is irrelevant.

2 vdevs, each 4 drives in raidz, pool made of 2 vdevs. Can survive loss of any one drive. If 2 drives in same vdev die then all pool data is lost. Odds are 3/7. reliability is 1.43 Performance is lower than mirrored mode, because parity must be calculated (XXX if your processor is fast enough, this might not be a problem?)

1 pool made of one vdev made of 8 drives in raidz. Can survive loss of only one drive. If any second drive is lost, all data is lost. Space efficiency is 7/8. reliability is 1.0 read speeds could be 8x but write speeds require parity calculation.

1 pool made of 8 vdevs in raidz2. Can survive loss of any two drives, loss of third drives means all data lost. Space efficiency is 6/8, reliability is 2.0.



ZFS TESTS:

single drive pool, pull plug while writing, examine start

steps: via webgui, add disk, format as zpool device, make a vdev of single disk, make pool of that vdev, share via cifs, copy files 1-8 and yank power during file 2; repeat with 2-8 etc

results: never loses files which have been completely written. however, sometimes a new file (one which is in the processing at time of power loss) ends up listed in ZFS with correct file size but different checksum. this seems inconsistent with copy-on-write design, and means that user would have to checksum the most recent files after a crash to determine whether they are perfect copies or not.

pool with single disk, separate zlog device, remove zlog device during reboot

steps: zpool create ZLogTest ad8 log da1
cannot use '/dev/log': must be a GEOM provider

results: zpool in FreeBSD underlying FreeNAS doesn't seem to support log!!! zpool --help command formats support this. According to http://forums.freebsd.org/archive/index.php/t-4641.html it is supported in ZFSv13 in FreeBSD 7-STABLE or 8-CURRENT; freenas is using 7-RELEASE. slashdot says "ZFS13 would break 7.2 ABI, so wait for 8"

pool with separate zlog device, remove zlog device while running

pool with separate zlog device, pull zlog device during write

multi-vdev pool, pull plug while writing, examine state
TODO: webgui doesn't seem to support using partitions to build pool, need more disks to test this

multi-drive vdev, change drive order and power on, without export/import

multi-drive dev, change drive order and power on, with export/import

No comments: