When I created the ZFS pool in my storage pod a few weeks ago, I didn't know about zpool's ashift=12 option that changes the default 512-byte block size to 4096-byte to work with new drives.
And I really should have made the raidz parity count match the max number of drives I'll have in an array. I can grow the array by adding drives, but once the raidz setting is made it cannot be changed. The following are conservative recommendations and not mandatory.
Max # drives Setting Equivalent Array capacityI have 2.3T of data on my ZFS array, which is still comfortably below the 3.6T ext4-formatted capacity of a single 4T drive. I rsync'd everything from the array (2,230,424 files) to that drive. Even with gigabit Ethernet that took a week -- I *really* need 10GE. When everything was safely backed up, I blew away the old array and created a new one:
3-5 raidz RAID5 3.6T - 14.4T
6-10 raidz2 RAID6 14.4T - 28.8T
11-15 raidz3 RAID7 28.8T - 43.2T
# zfs umount podThis whole process takes only as long as it does to type. There's no mkfs filesystem creation step for ZFS. But it took another week to rsync everything back. There was one period when the load factor hit 8+ during the transfer of a 40GB virtual machine image.
# zpool destroy -f pod
# zpool create -f pod ashift=12 raidz3 \
/dev/disk/by-id/scsi-SATA-ST4000NC000-1CD_Z3001HA2 \ These backslashes
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W30063RJ \ are not actually
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300JAJ0 \ typed, but are
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300NLT6 \ shown for clarity
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300J9LA \ of content only.
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300JW66 \ This is one long
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300H3PX \ command line.
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300H59J \
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300F5NR \
/dev/disk/by-id/scsi-SATA-ST4000DM000-1F2_W300KKZ3
# zfs create -p -o sharenfs=on pod
# zfs mount pod
The new raidz3 array is not yet optimal. Its size precludes the use of ZFS's native deduplication. There's no way I could cram the required 400GB of RAM on the pod's ATX motherboard. To dedupe the array I need to run Steve's linkdups utility:
# cd /podWhile this minimizes the space occupied by files, a side effect messes with the array parity checksums.
# linkdups -r -v
# zpool statusThese are not faults in the individual drives. They can be corrected by re-computing the checksums by running:
pool: pod
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
config:
NAME STATE READ WRITE CKSUM
pod ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
scsi-SATA_ST4000NC000-1CD_Z3001HA2 ONLINE 0 0 26
scsi-SATA_ST4000DM000-1F2_W30063RJ ONLINE 0 0 25
scsi-SATA_ST4000DM000-1F2_Z300JAJ0 ONLINE 0 0 21
scsi-SATA_ST4000DM000-1F2_Z300NLT6 ONLINE 0 0 14
scsi-SATA_ST4000DM000-1F2_Z300J9LA ONLINE 0 0 30
scsi-SATA_ST4000DM000-1F2_W300JW66 ONLINE 0 0 13
scsi-SATA_ST4000DM000-1F2_W300H3PX ONLINE 0 0 4
scsi-SATA_ST4000DM000-1F2_W300H59J ONLINE 0 0 10
scsi-SATA_ST4000DM000-1F2_W300F5NR ONLINE 0 0 12
scsi-SATA_ST4000DM000-1F2_W300KKZ3 ONLINE 0 0 23
errors: No known data errors
# zpool scrub podThis process runs invisibly in the background. To check its progress, run zpool status again:
# zpool statusWhen the scan/scrub is finished (it won't really take 81 hours), I'll re-zero the CKSUM column by:
pool: pod
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Sat Apr 19 06:59:26 2014
48.0G scanned out of 3.25T at 11.5M/s, 81h20m to go
0 repaired, 1.44% done
config:
NAME STATE READ WRITE CKSUM
pod ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
scsi-SATA_ST4000NC000-1CD_Z3001HA2 ONLINE 0 0 26
scsi-SATA_ST4000DM000-1F2_W30063RJ ONLINE 0 0 25
scsi-SATA_ST4000DM000-1F2_Z300JAJ0 ONLINE 0 0 21
scsi-SATA_ST4000DM000-1F2_Z300NLT6 ONLINE 0 0 14
scsi-SATA_ST4000DM000-1F2_Z300J9LA ONLINE 0 0 30
scsi-SATA_ST4000DM000-1F2_W300JW66 ONLINE 0 0 13
scsi-SATA_ST4000DM000-1F2_W300H3PX ONLINE 0 0 4
scsi-SATA_ST4000DM000-1F2_W300H59J ONLINE 0 0 10
scsi-SATA_ST4000DM000-1F2_W300F5NR ONLINE 0 0 12
scsi-SATA_ST4000DM000-1F2_W300KKZ3 ONLINE 0 0 23
errors: No known data errors
# zpool clearWhen all that's done I can take that extra 4T drive and append it to the array:
# zfs umount podThe result will be a larger array just as if I'd created it with eleven drives from the beginning.
# zpool add -f pod /dev/disk/by-id/scsi-SATA-ST400DM000-1F2_Z300J9G8
# zfs mount pod
--Doc