One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.
OpenZFS' encryption algorithm defaults to either aes-256-ccm
(prior to 0.8.4) or aes-256-gcm
(>= 0.8.4) when encryption=on
is set. But it may also be specified directly. Currently supported algorithms are:
aes-128-ccm
aes-192-ccm
aes-256-ccm
(default in OpenZFS < 0.8.4)aes-128-gcm
aes-192-gcm
aes-256-gcm
(default in OpenZFS >= 0.8.4)
There's more to OpenZFS native encryption than the algorithms used, though—so we'll try to give you a brief but solid grounding in the sysadmin's-eye perspective on the "why" and "what" as well as the simple "how."
Why (or why not) OpenZFS native encryption?
A clever sysadmin who wants to provide at-rest encryption doesn't actually need OpenZFS native encryption, obviously. As mentioned in the introduction, LUKS
, VeraCrypt
, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.
First, the “why not”
Putting something like Linux's LUKS
underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS datasets
and zvols
without access to the key. In fact, the attacker can't necessarily see that ZFS is in use at all!
But there are significant disadvantages to putting LUKS
(or similar) beneath OpenZFS. One of the gnarliest is that each individual disk that will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool import
stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it's in a position to undo all of ZFS' normal integrity guarantees.
Putting LUKS
or similar atop OpenZFS gets rid of the aforementioned problems—a LUKS
encrypted zvol
only needs one key regardless of how many disks are involved, and the LUKS
layer cannot undo OpenZFS' integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one zvol
per encrypted filesystem, along with a guest filesystem (e.g., ext4
) to format the LUKS
volume itself with.
Now, the “why”
OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn't nerf ZFS' own integrity guarantees. But it also doesn't interfere with ZFS compression—data is compressed prior to being saved to an encrypted dataset
or zvol
.
rsync
—and raw send makes it possible not only to replicate encrypted dataset
s and zvol
s, but to do so without exposing the key to the remote system.
This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend's house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.
In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks that have changed since that snapshot).
What’s encrypted—and what isn’t?
OpenZFS native encryption isn't a full-disk encryption scheme—it's enabled or disabled on a per-dataset/per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.
Let's say we create an encrypted dataset named pool/encrypted
, and beneath it we create several more child datasets. The encryption
property for the children is inherited by default from the parent dataset, so we can see the following:
root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted
Enter passphrase:
Re-enter passphrase:
root@banshee:~# zfs create banshee/encrypted/child1
root@banshee:~# zfs create banshee/encrypted/child2
root@banshee:~# zfs create banshee/encrypted/child3
root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 1.58M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 320K 848G 320K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3
root@banshee:~# zfs get encryption banshee/encrypted/child1
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child1 encryption aes-256-gcm -
At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:
root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn
root@banshee:~# zfs unmount banshee/encrypted
root@banshee:~# zfs unload-key -r banshee/encrypted
1 / 1 key(s) successfully unloaded
root@banshee:~# zfs mount banshee/encrypted
cannot mount 'banshee/encrypted': encryption key not loaded
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 2.19M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 944K 848G 720K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3
As we can see above, after unloading the encryption key, we can no longer see our freshly downloaded copy of Huckleberry Finn in /banshee/encrypted/child2/
. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset's properties, including but not limited to the USED
, AVAIL
, and REFER
of each dataset.
It's worth noting that trying to ls
an encrypted dataset that doesn't have its key loaded won't necessarily produce an error:
root@banshee:~# zfs get keystatus banshee/encrypted
NAME PROPERTY VALUE SOURCE
banshee/encrypted keystatus unavailable -
root@banshee:~# ls /banshee/encrypted
root@banshee:~#
This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn't automatically remount the dataset, either:
root@banshee:~# zfs load-key -r banshee/encrypted
Enter passphrase for 'banshee/encrypted':
1 / 1 key(s) successfully loaded
root@banshee:~# zfs mount | grep encr
root@banshee:~# ls /banshee/encrypted
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
In order to access our fresh copy of Huckleberry Finn, we'll also need to actually mount the freshly key-reloaded datasets:
root@banshee:~# zfs get keystatus banshee/encrypted/child2
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child2 keystatus available -
root@banshee:~# ls -l /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
root@banshee:~# zfs mount -a
root@banshee:~# ls -lh /banshee/encrypted/child2
total 401K
-rw-r--r-- 1 root root 554K Jun 13 2002 HuckFinn.txt
Now that we've both loaded the necessary key and mounted the datasets, we can see our encrypted data again.
49 Reader Comments
I guess a user (I'm thinking more personal laptop with Linux/*BSD with ZFS) could have the base home-dir be unencrypted, but all the sub-dirs are part of an encrypted dataset which then, on first UI login, it prompts for the passphrase to load the key.
As per https://zfsonlinux.org/manpages/0.8.6/man8/zfs.8.html
Controls the encryption cipher suite (block cipher, key length, and mode) used for this dataset. Requires the encryption feature to be enabled on the pool. Requires a keyformat to be set at dataset creation time.
Selecting encryption = on when creating a dataset indicates that the default encryption suite will be selected, which is currently aes-256-gcm In order to provide consistent data protection, encryption must be specified at dataset creation time and it cannot be changed afterwards.
For more details and caveats about encryption see the Encryption section.
EDIT: Updated to link to the newest manpage version, I linked to an older version that was out of date.
Last edited by Drizzt321 on Wed Jun 23, 2021 7:29 pm
Like so:
$zfs get encryption YourPool/YourDataset
NAME PROPERTY VALUE SOURCE
YourPool/YourDataset encryption aes-256-gcm -
second edit: It's GCM, btw.
Like so:
$zfs get encryption YourPool/YourDataset
NAME PROPERTY VALUE SOURCE
YourPool/YourDataset encryption aes-256-gcm -
malor's question was when the value is "on", what algorithm is it. As per the article, there's 2 different defaults, depending on the version of ZFS you're using. Basically anything in the last year or so, is using the GCM.
@malor, sorry, my link above is for an OLDER version of the man page, sorry, as per the article it's the -GCM, since >=0.8.4, with the current ZoL version being 0.8.6. I'll update my comment.
In my experience, you do not want to run ZFS encryption on a computer without AESNI. They've improved the software mode somewhat, but it's still extremely sluggish. With AESNI, it runs pretty much at disk speed. Without, it's a mess.
I initially tried encryption on an old i7-920, which is ancient, but still fairly performant. ZFS encryption was catastrophically bad on that hardware. The system was effectively unusable under a heavy write load. Swapping over to a 4790K instantly fixed it.
The software improvements have been added since my experimentation, but we had a recent poster in the Ars forum trying to use ARM binaries that didn't use their hardware acceleration, and it was a disaster for them too.
edit: if you don't have too many volumes, as the article says, you could use LUKS encryption, which is pretty fast on any hardware. But you'd probably have to script the unlock process, unless you want to type in one password per disk. I similarly script my ZFS key loading and mounting, so you're probably not really losing anything there.
It might potentially be a little less reliable, but AFAIK LUKS is a pretty skinny layer, so it will probably react about the same as putting the ZFS volume directly on the metal.
In my experience, you do not want to run ZFS encryption on a computer without AESNI. They've improved the software mode somewhat, but it's still extremely sluggish. With AESNI, it runs pretty much at disk speed. Without, it's a mess.
I initially tried encryption on an old i7-920, which is ancient, but still fairly performant. ZFS encryption was catastrophically bad on that hardware. The system was effectively unusable under a heavy write load. Swapping over to a 4790K instantly fixed it.
The software improvements have been added since my experimentation, but we had a recent poster in the Ars forum trying to use ARM binaries that didn't use their hardware acceleration, and it was a disaster for them too.
edit: if you don't have too many volumes, as the article says, you could use LUKS encryption, which is pretty fast on any hardware. But you'd probably have to script the unlock process, unless you want to type in one password per disk. I similarly script my ZFS key loading and mounting, so you're probably not really losing anything there.
It might potentially be a little less reliable, but AFAIK LUKS is a pretty skinny layer, so it will probably react about the same as putting the ZFS volume directly on the metal.
The idea of putting *anything* between ZFS and its platters, no matter how lightweight, horrifies me.
Even in this age of silicon shortages, I feel like if you're going to go through the effort to configure and deploy encrypted ZFS, it's *really* easy to justify swapping in a CPU that supports AES-NI (and a new MB, if need be). If for no other reason than to have some confidence that it isn't going to die of old age as soon as you start using it
As noted in the article, it'll be either aes-256-ccm or aes-256-gcm depending on the OpenZFS version with which the dataset was created.
Yes, that sucks; yes, it should show the literal value and not merely that it was originally set to some unspecified "default". We have the same problem with ashift values, which simply show as "0" if the ashift wasn't manually specified when the zpool was created. Sigh.
edit: zfs get encryption will show you the specific algorithm used to encrypt a dataset or zvol, even if you didn't specify it manually when you created it. But be sure to use zfs get, not zpool get...
Last edited by Jim Salter on Wed Jun 23, 2021 8:31 pm
NAME PROPERTY VALUE SOURCE
(volume) feature@encryption active local
... which wasn't what I wanted.
Then I did a ton of messing around trying to find the right string to query, assuming that it must be somewhere behind the feature@ wall. Didn't find anything. Did a bunch of web searching and didn't find anything explicit. So then I asked here, and:
NAME PROPERTY VALUE SOURCE
(volume)/(fileset) encryption aes-256-gcm -
Et voila, exactly what I was looking for. I was looking in the wrong place, and with the wrong syntax. feature@encryption is a thing for zpools, but apparently it's just straight "encryption" for filesets.
edit: note that I might well have set that algorithm directly instead of using 'on'. I spent a couple of days studying up on ZFS and pre-building my filesystem create lines.
NAME PROPERTY VALUE SOURCE
(volume) feature@encryption active local
... which wasn't what I wanted.
Then I did a ton of messing around trying to find the right string to query, assuming that it must be somewhere behind the feature@ wall. Didn't find anything. Did a bunch of web searching and didn't find anything explicit. So then I asked here, and:
NAME PROPERTY VALUE SOURCE
(volume)/(fileset) encryption aes-256-gcm -
Et voila, exactly what I was looking for. I was looking in the wrong place, and with the wrong syntax. feature@encryption is a thing for zpools, but apparently it's just straight "encryption" for filesets.
Ahhhh, okay. Yeah, `zpool get` lets you know that your pool supports encryption, and also in this case that the feature is active on that pool... somewhere. But it's literally just checking to see if your pool supports that feature, not any details about how you've used the feature.
`zfs get` returns ZFS properties of the specific dataset or zvol itself, which is very much what you wanted here (applied to the dataset or zvol you were curious about in the first place).
Actually it looks like this isn't the ashift fiasco all over again after all; I definitely did not manually set the algo on my examples in the article, but I see my own `zfs get encryption` in my own article shows that you get the actual algo back, not just on.
Maybe next time I'll RTFA... 🙃
When I wrote my most famous line (comment 2) last year, I was only joking.
Not surprisingly, I didn't win editor's pick for that.
However, this can still be set up for older zfs releases! For my Ubuntu 20.04 NAS, I ended up extracting the debian package of openzfs 2.0 (but not installing it!) and copying the relevant files from /usr/share/initramfs-tools to /etc/initramfs-tools, adding my key to /etc/dropbear-initramfs/, and rebuilding initramfs. Now, I can unlock my NAS without having to resort to putting the keyfile on a USB stick or similar.
One side benefit is that I currently have two pools; an SSD boot pool for the OS and apps, and a rust pool for bulk storage. I put a keyfile on the also-encrypted root pool, so it was straightforward to add an additional call to import the pool and run `zfs load-key -a` immediately after unlocking the root pool.
I'm glad I went this way instead of running a backport of 2.0 to get unlock functionality. I'm running Docker on the host, and https://github.com/openzfs/zfs/issues/11480 is certainly one of the most heart-stopping filesystem issues I've seen hit production systems in a long time.
NAME PROPERTY VALUE SOURCE
(volume) feature@encryption active local
... which wasn't what I wanted.
Then I did a ton of messing around trying to find the right string to query, assuming that it must be somewhere behind the feature@ wall. Didn't find anything. Did a bunch of web searching and didn't find anything explicit. So then I asked here, and:
NAME PROPERTY VALUE SOURCE
(volume)/(fileset) encryption aes-256-gcm -
Et voila, exactly what I was looking for. I was looking in the wrong place, and with the wrong syntax. feature@encryption is a thing for zpools, but apparently it's just straight "encryption" for filesets.
Glad I could help!
Also, you can get a list of all properties & their settings with:
zfs get all (volume)/(fileset)
Even if there is a speed hit as long as it is still fast enough to stream BluRay backups using Plex that is fine for me.
Even if there is a speed hit as long as it is still fast enough to stream BluRay backups using Plex that is fine for me.
If you're talking about a CPU with AES-NI, it'll be OK. If it doesn't have AES-NI... it may not be sufficiently powerful. You'll find out, obviously.
Well, goddammit. The answer was right there all along. I read the article carefully, but apparently not the images.
This does not appear correct, according to the man page (and indeed the latter part of this article that refers to change-key):
I guess a user (I'm thinking more personal laptop with Linux/*BSD with ZFS) could have the base home-dir be unencrypted, but all the sub-dirs are part of an encrypted dataset which then, on first UI login, it prompts for the passphrase to load the key.
On Linux, a pam module could be written to handle this.
I have started a discussion with the devs and even tried to implement myself but I am not much of an expert of Linux internals/kernel modules and assembly and I'm struggling.
If anyone here is interested, join the discussion: help would be welcome
A tip: When encrypted child datasets are replicated by sanoid/syncoid, the datasets lose their inherited encryption parameters at the replication target (e.g., turns into "prompt" instead of a path for keylocation). When using raw sends. This can be fixed after unlocking both child and parent on the target and using "zfs change-key -i ...".
Also, benchmark the encryption performance.
I wrote a bash script based on Jim's fio article from a year or so back, which tried different algorithm and recordsize combinations, for use on a Raspberry Pi with an USB3 HDD. My conclusion was that 512 kB recordsizes performed better than 1M recordsizes, and CCM outperformed GCM (obviously, in my case, so YMMV).
(Edit: 512 kB recordsizes performed better than 1 MB recordsizes when fio used large block writes, despite "conventional wisdom" that 1 MB recordsizes should have been better in that case. I don't know if this is related to the USB HDD performance or the Raspberry itself, but since that combination was what I used as an off-site backup method, I didn't care why.)
Last edited by NewCrow on Thu Jun 24, 2021 6:30 am
I realise running ZFS on LUKS would get around this (with some jiggery-pokery) but it seems odd that ZFS itself doesn't have a command line option to hide the metadata.
I realise running ZFS on LUKS would get around this (with some jiggery-pokery) but it seems odd that ZFS itself doesn't have a command line option to hide the metadata.
I'm stuck running GELI under ZFS because, upgraded from way back when that was the only option, but even that puts a header on the disk I wish ZFS/FreeBSD would provide a way to migrate in place but such is life. pool upgrade is coming soon and Ill fix it then.
The solution to the legal conundrum you are describing is to know who has control of what data on your server so you can rat them out. Ideally you'd rally your local parliamentarian and convince them that such shit laws are shit but they seem to only obey their TLASOUP masters about such matters these days.
Even if there is a speed hit as long as it is still fast enough to stream BluRay backups using Plex that is fine for me.
If you're talking about a CPU with AES-NI, it'll be OK. If it doesn't have AES-NI... it may not be sufficiently powerful. You'll find out, obviously.
My poxy old Celery 1610 without AES worked just fine for that end. No bandwidth to transcode, but it could easily go way faster than even the highest bit rate bluray rips.
* Use encryption only on the data you're absolutely ready to lose.
* Remember you're mortal and the data you're encrypting might be lost forever if something terrible happens to you (death notwithstanding there are many other things which can make you forget).
* People do forget passwords.
I'm not dissuading anyone here from encrypting their data, I just believe many people use it needlessly.
Regarding this though:
I know your aim is probably to keep things simple, but I actually use prompt in automation (no human involved) rather than keyfiles as I just find it so much more convenient, for example, on macOS you can then do:
This lets me keep the passwords in a macOS keychain like I would for encrypted non-system APFS volumes, Core Storage, disk images etc. (for anyone wanting to do the same, remember to add /usr/bin/security to the list of applications that can access your scripted keychain entries, or it won't work)
While a keyfile would probably be about as secure if handled properly, prompt lets you pull the password from anywhere you like so it's very convenient for automating in that way.
I'm a huge fan of OpenZFS' encryption though, makes things so much simpler, and zfs send being able to send the encrypted blocks is just fantastic for backing up with.
The only feature missing is a free replacement for my old (but still stubbornly in perfect working order) Synology NAS that only supports BTRFS; fortunately it also does iSCSI so I can still ZFS send to it that way, but it's hardly an ideal way to do it.
Beside all advantages to use native zfs encryption, the ability to backup to a remote zfs server keeping all the data encrypted during transit and then at rest (without any decryption/encryption) is extremely exciting for me.
Naturally transport will be also encrypted, but by keeping the data encrypted all the way gives a strong guarantee of security, and will incredibly simplify my procedures and my paperwork!
Yes, strong security should be documented and proved, and there's a lot of paperwork for some rightfully demanding clients.
compliance.
* Use encryption only on the data you're absolutely ready to lose.
* Remember you're mortal and the data you're encrypting might be lost forever if something terrible happens to you (death notwithstanding there are many other things which can make you forget).
* People do forget passwords.
I'm not dissuading anyone here from encrypting their data, I just believe many people use it needlessly.
encrypt *everything* so *when* the hardware fails you can toss with no worries.
Also backup regularly to an offline resource....
zfs snapshots greatly simplify backup/recover...
S
compliance.
ZFS can't use any but the most basic of kernel facilities, because it's not GPL, and the kernel devs shut out almost everything they do from non-GPLed code.
In effect, you've got two code stacks in a Linux-ZFS system.... the Linux side and the ZFS side. This is probably part of why ZFS violates the normal Linux layering so badly; since it can't use what Linux offers, it reimplements almost everything from the metal upward. It finds and manages its own disks, manages its own compression, encryption, and error detection, does its own rebuilding, and provides its own mount namespace. User level programs don't see much of a difference, they're still just talking to files on disk, but when they're on a ZFS dataset, they're interacting with a whole separate codebase.
The only really notable problem that ensues is that ZFS, by default, grabs half the RAM in the system to use as cache. It can be sluggish about releasing this back to the system under memory pressure, which can cause significant issues. I'm pretty sure you can tune the RAM allocation down, although I haven't bothered looking up how, as the only system I'm presently running is a dedicated server, and would be using half its memory for cache anyway.
Because the other link had more Googlejuice, and I didn't investigate it more closely before wrapping the anchor tag (since I remembered that being the official site for Veracrypt from the last time I'd looked at it, not that long after TrueCrypt was abandoned).
Fixed now. Thanks for the tip!
Create a file /etc/modprobe.d/zfs.conf and set options zfs zfs_arc_max= in bytes.
Yes, bytes. No, it doesn't honor or understand unit suffixes. Yes, that makes for a bloody enormous number in a server with 1TiB of RAM...
Create a file /etc/modprobe.d/zfs.conf and set options zfs zfs_arc_max= in bytes.
Yes, bytes. No, it doesn't honor or understand unit suffixes. Yes, that makes for a bloody enormous number in a server with 1TiB of RAM...
*dreams* we do some pretty fucking big iron at work but 1T RAM is still 4x the largest toy I have access to currently.
You must login or create an account to comment.