Recovering Ubuntu 12.04 ZFS on Linux root pool mirror boot failure 3


Recently I ran into a problem booting my root ZFS mirror. Couldn’t get it to load, and only saw blinking cursor or blank screens. Grub could see the FS though!  I suspect I had a corrupted or out-of-sync zpool.cache file. Don’t ask me how that can happen, because the whole reason I run ZFS is so things like this don’t happen, yet on Linux the zpool.cache can be messed up somehow!  What I *think* had happened, is having something to do with all the recent zpool changes I had made recently. I had a couple pools of drives that I moved out of this system and kept only the root pool mirror for my development system.  So I don’t have answers for why the cache gets messed up, I do think my changes recently had something to do with it.  Probably something I did!

So I booted to thumb drive that had 12.04 server installed. I suppose you could just as easily use a LIVE DVD or USB stick, but you’ll need to apt-get install the zfs packages.

After I booted my thumb server, I updated zfs-linux packages!! Couldn’t get to the drives until it was updated because of the new ZFS format. I think the pool format version was 5000 and the older ZFS utils didn’t work with it. Having installed a handful of FreeBSD and Ubuntu servers with ZFS now, I am thinking better to create root ZFS pools using version=28 for better compatibility. I think that’s the advice of the ZFS On Linux people too.

Anyway, once server had working ZFS, I exported everything, and reimported like below:

zpool export rpool

zpool import -d /dev/disk/by-id -f -N rpool

Now that I had my rpool imported but NOT mounted, I will mount it so I can replace the cache. (NOTE: I am using /mnt/rootfs as my mount point, but instructions online typically have you mounting to /root, just replace as you see best for your system)

mount -t zfs -o zfsutil rpool/ROOT/ubuntu-1 /mnt/rootfs

The guides online tell me, DO NOT MOUNT other file systems! (ok, got it!) Just copy over the zpool.cache. I made a backup first, just in case.

cp /etc/zfs/zpool.cache /mnt/rootfs/etc/zfs/zpool.cache

Then…

Did a chroot into my mirror, updated /etc/default/grub
to show text only boot console.
Did an update-initramfs -c -k all
Ran grub-install /dev/sda and /dev/sdc (which are my mirror drives)
And lastly, update-grub.

Exit chroot environment.
Unchroot and unmounted /mnt/rootfs.
Reboot! Success!

  • mchris

    Hi – reading your post it seems that you have a mirrored ZFS and small GRUB partition like this guide suggests (https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-to-a-Native-ZFS-Root-Filesystem)
    Have you been able to boot from either disk? If so, how did you set up GRUB and the partitions?
    I am stuck on the same place as “thomas” in this post: https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/dx1O9QJHq-U
    Best regards, Mchris

    • Greg Fischer

      Hi mchris! Thanks for the comment!
      Unfortunately I can’t answer that specifically because I haven’t tested directly booting the second drive on a ZFS root install. (Although I swear I had done that at one point, a while back) However, the data should be there, which means you should be able mount the mirror and transfer to a new mirror for recovery. WHICH IS NOT AT ALL what we want from this, I know! We want to “just boot” that second drive. Like I said, I haven’t actually tested that specific scenario.

      Its been several months since I’ve done any ZFS Linux testing, so I am a bit out of practice. I can say this though, I only have 1 ZFS on Linux server running on a root mirror, and that’s my own personal development server. But I do have 5-6 ZFS server’s in production running either Linux or FreeBSD, and only the FreeBSD are setup with root ZFS mirrors. Why? Because I have found the root ZFS on Linux to be “not quite ready” for production. ZFS itself, setup on a data array is great though, and I have it running in production. But for now, the root OS array is still running the typical LVM-over-MD mirror. That, IMHO, is a much more reliable “root” environment for now. I do find ZFS to be very reliable and I LOVE IT on a secondary array from the OS.

      FreeBSD – PCBSD seems to be much better in the root-boot ZFS area, and I have a few servers running that way. Works great. But most of my clients need Linux because of some software they need. I know none of this probably helps, but just my thoughts. Good luck!