ZFS zhack to the rescue


a.k.a why single disk pools are a bad idea

So, I bought a new hard disk for my server recently. The intention was to expand the size of one of the zpools that is used for local backup replicas. It should have been an easy job, but easy jobs always have a way of going wrong…

I’d followed the same process that I had followed several times before, put the new drive in a spare bay, run the various tools you can to zero the drive out, write random data across it, zero it out again and so on. Everything was fine, no issues showing up, no performance anomalies to investigate. So I pushed the button to start the replacement in ZFS of the old drive with the newer larger one. Everything was still fine, the data all copied across, no downtime, no issues, zfs pool expanded, green lights across the board.

Those lights didn’t stay green though, a few days later, the dreaded click of death hit the new drive. I’ve got several disks of the same model which are all performing flawlessly, so this was something of a disappointment (yes, I have triple checked since). No harm I thought, the original drive is still there, the backup tasks to it can catch up the week or so that’s gone by without any issue, I’ll just oust the new drive and it’ll be fine… right?

Nope, turns out that the replacement process had somehow made the original drive invisible, I knew all the data was still there but I was finding it impossible to get ZFS to acknowledge the same. Thankfully after a lot of searching online, I discovered this issue on the OpenZFS Github that described the zhack command.

Thankfully a quick zhack label repair later and ZFS was suddenly able to bring the pool back into existence and I just have some returns paperwork to fill out for a dead drive but it was a good reminder that a copy is not a backup

blog  zfs  zhack 

See also