Key | Value |
---|---|
Summary | Enabling ZFS on 20.04, creating a test pool and recovering from deliberate disk “corruption” with ZFS mirrored vdevs |
Categories | server |
Difficulty | 2 |
Author | Aaron Whitehouse code@whitehouse.kiwi.nz |
Overview
Duration: 1:00
ZFS is a combined file system and logical volume manager. ZFS includes protections against data corruption and built-in disk mirroring capabilities.
This guide will go through the process of installing ZFS on Ubuntu 20.04 LTS, setting up a storage pool with fake disks set up in a mirrored vdev configuration and then deliberately damaging the data on to test ZFS’s self-healing capabilities.
What you’ll learn
- How to install ZFS
- How to create a striped pool using image files and how this reacts to (fake) disk corruption
- How to create a mirrored storage pool using image files
- How ZFS automatically recovers a mirror from disk corruption
- How to replace a failed disk (file) in a mirrored vdev
What you’ll need
- Ubuntu Server or Desktop 20.04 LTS
- 300MB free space
Disclaimer: while I work at Canonical, I do not have anything to do with ZFS in that capacity and I have authored this simply as an Ubuntu user interested in ZFS.
Installing ZFS
Duration: 1:00
The main components of ZFS are maintained as a standard Ubuntu package, so to install simply run:
sudo apt install zfsutils-linux
After that, we can check if ZFS was installed correctly by running:
whereis zfs
You should see output similar to the following:
zfs: /sbin/zfs /etc/zfs /usr/share/man/man8/zfs.8.gz
Now that we’re done installing the required packages, let’s create a storage pool!
Create and test a ZFS Pool with a Striped vdev
Duration: 10:00
Creating image files to use as fake disks
We are going to create image files to use as fake disks for ZFS, so that we can test them without worrying about our data.
First, let’s create a folder to work in:
mkdir test_zfs_healing
cd test_zfs_healing
Now let’s create two image files to use as our fake disks:
for FAKE_DISK in disk1.img disk2.img
do
dd if=/dev/zero of=`pwd`/$FAKE_DISK bs=1M count=100
done
If you do an ls
, you should now see two img files:
$ ls
disk1.img disk2.img
Let’s save our working directory to a variable to make it easier to come back here later:
ZFS_TEST_DIR=`pwd`
Creating a Pool
We are going to create a striped vdev, like RAID-0, in which data is striped dynamically across the two “disks”. This is performant and lets us use most of our disk space, but has no resilience.
To create a pool with a striped vdev, we run:
sudo zpool create test_pool_striped \
`pwd`/disk1.img \
`pwd`/disk2.img
If we run zpool list
, we should see the new pool:
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
test_pool_striped 160M 111K 160M - - 1% 0% 1.00x ONLINE -
Note that the size is 160M from our two 100MB raw disks (200MB total), so we have the use of most of the space.
If we run zpool status test_pool_striped
, we should see the details of our fake disks:
$ zpool status test_pool_striped
pool: test_pool_striped
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test_pool_striped ONLINE 0 0 0
/home/user/test_zfs_healing/disk1.img ONLINE 0 0 0
/home/user/test_zfs_healing/disk2.img ONLINE 0 0 0
errors: No known data errors
Add text to the new pool
We can see where our pool has been mounted with:
zfs mount
and we should see something like:
$ zfs mount
test_pool_striped /test_pool_striped
First we’ll change the mountpoint to be owned by the current user:
sudo chown $USER /test_pool_striped
Then let’s change into that mountpoint:
cd /test_pool_striped
Then we will create a text file with some text in it:
echo "We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits." > text.txt
We can show the text in the file with:
cat text.txt
$ cat text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits.
And we can look at the hash of the file with:
sha1sum text.txt
$ sha1sum text.txt
c1ca4def6dc5d82fa6de97d2f6d429045e4f4065 text.txt
Deliberately damage a disk
First we will go back to our directory with the disk images:
cd $ZFS_TEST_DIR
Now we are going to write zeros over one of the disks to simulate a data corruption or partial disk failure of one of the disks in our mirror.
WARNING!
This is a dangerous operation as you are writing random
data over something. Make sure you are writing over the
correct file!
dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
and we should see output like:
$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.173905 s, 603 MB/s
Now change back to the mountpoint:
cd /test_pool_striped/
and read the file:
cat text.txt
cat text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal, but even it has limits.
Oh. Let’s check the hash:
sha1sum text.txt
$ sha1sum text.txt
c1ca4def6dc5d82fa6de97d2f6d429045e4f4065 text.txt
Everything seems fine…
Export the pool
I believe that, when the pool is online, it detects and prevents the corruption. That is great, but interferes with our testing! So we need to export the pool first:
cd $ZFS_TEST_DIR
sudo zpool export test_pool_striped
And if we run a zpool list
, test_pool_striped
should no longer appear.
Damage the disk again
So now let’s try damaging the “disk” again:
dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
which gives:
$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.173001 s, 606 MB/s
We now try to re-import the pool with:
sudo zpool import -d $ZFS_TEST_DIR/disk2.img
And we are told that it could not do so because the pool has damaged devices or data:
$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
pool: test_pool_striped
id: 3823113642612529477
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
see: http://zfsonlinux.org/msg/ZFS-8000-72
config:
test_pool_striped FAULTED corrupted data
/home/user/test_zfs_healing/disk2.img ONLINE
If you are lucky, you may instead see something more like:
$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
pool: test_pool_striped
id: 706836292853756916
state: ONLINE
status: One or more devices contains corrupted data.
action: The pool can be imported using its name or numeric identifier.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
config:
test_pool_striped ONLINE
/home/user/test_zfs_healing/disk1.img UNAVAIL corrupted data
/home/user/test_zfs_healing/disk2.img ONLINE
But in fact trying to import this with:
sudo zpool import test_pool_striped -d $ZFS_TEST_DIR/disk2.img
still does not import (on my system this runs for a very long time, never works and makes zpool commands not work in the meantime).
Clean up
Let’s delete our disk files:
cd $ZFS_TEST_DIR
rm disk1.img disk2.img
Create and test a ZFS Pool with a Mirrored vdev
Duration: 10:00
Creating image files to use as fake disks
Let’s create two image files to use as our fake disks again:
for FAKE_DISK in disk1.img disk2.img
do
dd if=/dev/zero of=`pwd`/$FAKE_DISK bs=1M count=100
done
Again, if you do an ls
, you should now see two img files:
$ ls
disk1.img disk2.img
Creating a Pool
This time, we are going to create a mirrored vdev , also called RAID-1 , in which a complete copy of all data is stored separately on each drive.
To create a mirrored pool, we run:
sudo zpool create test_pool_with_mirror mirror \
`pwd`/disk1.img \
`pwd`/disk2.img
Note the addition of the word mirror
between the pool name and the disk names.
If we run zpool list
, we should see the new pool:
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
test_pool_with_mirror 80M 111K 79.9M - - 3% 0% 1.00x ONLINE -
But note that this time the size is only 80M, half what it was before. This makes sense, as we are storing two copies of everything (one on each disk), so we have half as much space.
If we run zpool status test_pool_with_mirror
, we should see that the disks have been put into a mirror vdev named mirror-0
:
$ zpool status test_pool_with_mirror
pool: test_pool_with_mirror
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test_pool_with_mirror ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/test_zfs_healing/disk1.img ONLINE 0 0 0
/home/user/test_zfs_healing/disk2.img ONLINE 0 0 0
errors: No known data errors
Add some data
We can see where our pool has been mounted:
$ zfs mount
test_pool_with_mirror /test_pool_with_mirror
First we’ll change the mountpoint to be owned by the current user:
sudo chown $USER /test_pool_with_mirror
Then let’s change into that mountpoint:
cd /test_pool_with_mirror
Again we will create a text file with some text in it:
echo "We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall." > text.txt
We can show the text in the file with:
cat text.txt
$ cat text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall.
And we can look at the hash of the file with:
sha1sum text.txt
$ sha1sum text.txt
aad0d383cad5fc6146b717f2a9e6c465a8966a81 text.txt
Export the pool
As we learnt earlier, we first need to export the pool.
cd $ZFS_TEST_DIR
sudo zpool export test_pool_with_mirror
And, again, if we run a zpool list
, test_pool_with_mirror
should no longer appear.
Deliberately damage a disk
First we will go back to our directory with the disk images:
cd $ZFS_TEST_DIR
Now again we are going to write zeros over a disk to simulate a disk failure or corruption:
dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
We see something like the following output:
$ dd if=/dev/zero of=$ZFS_TEST_DIR/disk1.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.172324 s, 608 MB/s
Re-import the pool
Now we are going to re-import our pool:
sudo zpool import -d $ZFS_TEST_DIR/disk2.img
And we see something like the following output:
$ sudo zpool import -d $ZFS_TEST_DIR/disk2.img
pool: test_pool_with_mirror
id: 5340127000101774671
state: ONLINE
status: One or more devices contains corrupted data.
action: The pool can be imported using its name or numeric identifier.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
config:
test_pool_with_mirror ONLINE
mirror-0 ONLINE
/home/user/test_zfs_healing/disk1.img UNAVAIL corrupted data
/home/user/test_zfs_healing/disk2.img ONLINE
As expected, disk1.img
is showing as corrupted, as we wrote over it with zeros, but, in contrast to the pool with the striped vdev earlier, instead of failing to import as FAULTED
, the pool is instead showing ONLINE
, with disk2.img showing as ONLINE
and only the disk1.img that we overwrote showing as UNAVAIL
because of its corrupted data.
The output tells us that we can import the pool by using its name or ID, so let’s do that:
sudo zpool import test_pool_with_mirror -d $ZFS_TEST_DIR/disk2.img
Checking Pool Status
We can check the pool status with:
zpool status test_pool_with_mirror
And the output should look something like:
$ zpool status test_pool_with_mirror
pool: test_pool_with_mirror
state: ONLINE
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test_pool_with_mirror ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
4497234452516491230 UNAVAIL 0 0 0 was /home/user/test_zfs_healing/disk1.img
/home/user/test_zfs_healing/disk2.img ONLINE 0 0 0
errors: No known data errors
So the pool is online and working, albeit in a degraded state. We can look at the file we wrote earlier:
$ cat /test_pool_with_mirror/text.txt
We are playing with ZFS. It is an impressive filesystem that can self-heal. Mirror, mirror, on the wall.
Replacing the failed device
The status is telling us that we are missing a device and the pool is degraded, so let’s fix that.
Let’s create a new “disk” in our working directory:
cd $ZFS_TEST_DIR
dd if=/dev/zero of=`pwd`/disk3.img bs=1M count=100
Then, let’s follow the instructions from the zpool status
and replace the disk:
sudo zpool replace test_pool_with_mirror $ZFS_TEST_DIR/disk1.img $ZFS_TEST_DIR/disk3.img
Check the zpool status again
We can see how this disk replacement has affected things by checking zpool status test_pool_with_mirror
:
$ zpool status test_pool_with_mirror
pool: test_pool_with_mirror
state: ONLINE
scan: resilvered 274K in 0 days 00:00:00 with 0 errors on Sat Nov 27 22:43:37 2021
config:
NAME STATE READ WRITE CKSUM
test_pool_with_mirror ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/home/user/test_zfs_healing/disk3.img ONLINE 0 0 0
/home/user/test_zfs_healing/disk2.img ONLINE 0 0 0
errors: No known data errors
disk1.img has been replaced by disk3.img and it tells us that it has “resilvered” the data from the mirror (disk2.img) to the new disk (disk3.img).
Removing the pool and cleaning up
We can now remove the test pool:
sudo zpool destroy test_pool_with_mirror
and it should no longer show in a zpool list
.
Then we can remove the fake “disks” we created:
cd $ZFS_TEST_DIR
rm disk1.img disk2.img disk3.img
cd ..
rmdir $ZFS_TEST_DIR
That’s all!
Duration: 1:00
Congratulations! We have covered:
- How to install ZFS
- How to create a striped pool using image files and how this reacts to (fake) disk corruption
- How to create a mirrored storage pool using image files
- How ZFS automatically recovers a mirror from disk corruption
- How to replace a failed disk (file) in a mirrored vdev
Further reading
- For detailed operations check out the Oracle ZFS Administration Guide
- For an excellent background reference on ZFS, see FreeBSD Mastery: ZFS – while this is focused on BSD, nearly all of the content will be helpful for OpenZFS Linux users
- Similarly, the FreeBSD handbook has an excellent chapter on ZFS, much of which will be applicable to Linux.
- For a quick ‘cheat-sheet’ guide, try the Ubuntu Wiki ZFS reference
- There is also a lengthy 101 guide to ZFS on Arstechnica