Saturday, March 5, 2011

EON NAS ZFS troubleshooting log

Sometimes it will not boot - stuck at "verifying DMI pool" - this is a sign that boot device is not found/working. Check bios - sometimes TRANSCEND (CF card) loses its place in the boot priority list. May require reseating CF card.


NOTED: mar 5 2011 - one drive was offline. c0t1d0 ?? after replugging cables, it came back online. however if this keeps happening, drive might be dodgy

aug 2 2011 - drive / pool access was hanging, even after reboot. opened case and wiggled sata cables to make sure they were tight. pool back online after bootup. c3t4d0 needed resilvering


Dec 10 2011 - Volume was dropped (2 drives lost?) while copying data off of it.

Boot log:

Mem test
Detecting IDE drives ...
Detecting IDE drives ...



Serial ATA AHCI BIOS etc
Please wait...
Controller Bus#00, Device #1F (??)
Port-00: Hitachi
Port-01: ST32etc...
Port-02: Hitachi
Port-03: ST32
Port-04: ST32
Port-05: No device detected
AHCI BIOS installed


GIGABYTE Technology Corp. PCIE-to-SATAII/IDE RAID Controller
HD00: Hitachi
HD01: ST32

Sil 0600 ATA/133 Controller BIOS
Drive number: B TRANSCEND



Boot completes. zpool status shows:

DEGRADED
status: one of more devices has experienced an error resulting in data corruption.
action: Restore the file in question if possible. Otherwise restore the entire pool from backup
config:
mediapool DEGRADED
raidz2-0 DEGRADED
c0t0d0 ONLINE
c0t1d0 ONLINE
081420 FAULTED was /dev/dsk/c2t2d0s0
203842 FAULTED was /dev/dsk/c2t3d0s0
c2t0d0 ONLINE
c2t1d0 ONLINE
c2t2d0 ONLINE
c2t3d0 ONLINE
c2t4d0 ONLINE

errors: 1 data errors, use -v for list



zpool status -v:

/mediapool/media/etc/etc/Dragonzakura - ep09 (704x396) [RAW].avi

The good news is that the pool seems to be (barely!) safe, and the one file in question is clearly indicated (maybe I was deleting offloaded files when disks dropped offline??)

Shutdown, open case. Clearly, all the builtin-ports are working, both by the numbers and drive types. Not surprisingly, the missing drives are on the Addonics 3rd-party PCI card (which has two hitachis attached). In the past I reseated, but that shouldn't really be necessary.

Poking around in BIOS, there is no way to list PCI devices that I see, but hard drive boot order included the TRANSCEND CF card device (a PCI expansion) but lists only the 7 SATA drives above - no sign of the missing two drives on Addonics expansion card.

Reseated PCI card, booted - all drives back online, resilvered, no known data errors (presumably that means the one video file was checked against restored parity bytes and found to be correct?)

Resolved: need to replace card, or better yet find a drop-in replacement for motherboard with more SATA ports.


Problem: Windows mount of EON system has permissions to write new files, but certain existing folders can't be written to, and certain folder can't be moved. Windows' "file properties" shows "read-only" checked but greyed out. I can uncheck it, but next time I open it the box is re-checked. I used to have that old problem with the old system that had complex ACLs. I would solve it by re-applying the complex ACL from my notes from the original install. Now with the new simpler ACLs I am still seeing it. 

Workaround: re-apply the original ACLs

chmod -R A+owner@:full_set:fd:allow,everyone@:read_set/execute:fd:allow /mediapool/media