What is ZFS
ZFS is a modern file system with many features. Some of the important features include:
Copy-on-write: When modifying a file, ZFS first copies the data being modified and then performs the modification. This ensures data integrity.
Redirect-on-write: When writing new data, ZFS writes the new data to a new location instead of overwriting the existing data. This protects the integrity of the original data and improves write efficiency.
Deduplication: When multiple files have the same data blocks, ZFS only keeps one copy of the data block on the storage device, saving storage space.
Snapshots: ZFS can take snapshots of the file system at a specific point in time, recording the state of the file system. This allows for easy restoration to a previous state and can also be used for data backup and recovery.
Compression: ZFS supports on-the-fly compression of data, which can save storage space and improve performance by reducing the amount of data that needs to be read from or written to the storage device.
Caching: ZFS has a built-in cache mechanism called the Adaptive Replacement Cache (ARC) that can improve read performance by keeping frequently accessed data in memory. Additionally, ZFS also supports the use of separate cache devices such as solid-state drives (SSDs) to further improve performance.
RAID: ZFS supports multiple RAID levels, including RAID 0, RAID 1, RAID 5, RAID 6, and RAID 10. These RAID levels can be used to improve performance and/or provide data redundancy.
Storage pools: ZFS uses storage pools to manage storage devices. A storage pool can consist of one or more storage devices, and the storage capacity of the pool is shared between all devices in the pool. This allows for easy expansion of storage capacity by adding more devices to the pool.
Data integrity: ZFS uses checksums to ensure data integrity. When reading data from a storage device, ZFS verifies the checksum of the data to ensure that it has not been corrupted. If the checksum does not match, ZFS will attempt to repair the data by reading it from another device in the pool.
Data scrubbing: ZFS can periodically scan the storage devices in a pool to detect and repair data corruption. This helps to ensure that data is not corrupted over time.
These features make ZFS a powerful and reliable file system suitable for large-scale storage and data management scenarios.
Common knowledge
Pool
In the world of ZFS, everything is based on storage pools. First, you need to create a storage pool for one or more hard drives. The storage pool manages the disks and provides storage space. The capacity is not shared between multiple storage pools.
Forget about RAID! Even you may have a RAID controller, ZFS still expect that you expose all physical disks to a pool!
Set
On top of the storage pools, we can create multiple datasets. Datasets do not require allocated space, which means each dataset can utilize the entire storage capacity of the pool. A dataset must belong to one and only one storage pool.
After creating a dataset, it is mapped as a directory. This allows you to store and organize files and subdirectories within the dataset. For example, if you create a dataset called "photos", you can map it to a directory like /mnt/pool/photos
. You can then create subdirectories within this directory to organize your photos. This makes data organization and management more flexible and convenient.
Commands
Installation
To install basic utilities, run:
sudo apt install zfsutils-linux
Use some basic tools like lsblk
and fdisk
to locate your disk.
Getting context
List sets:
# zfs list
List pools:
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
my_pool 47.5G 387K 47.5G - - 0% 0% 1.00x ONLINE -
# zpool status
pool: my_pool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
my_pool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
errors: No known data errors
Also if you want to check the io status, use zpool iostat
:
# zpool iostat -v
Managing Pools
- Use one disk to create one pool
- Use multiple disks to create one pool
- Use one disk to create multiple pools
You can create a new pool with command (Use one disk):
# zpool create my_pool /dev/sda # <--- /dev/sda is your physical disk!
To create multiple pools with one disk(NOT suggested):
# zpool create my_pool /dev/sda1 # <--- /dev/sda1 is your partition. This is NOT suggested because the capacity was not shared between pools!
Of course, you can use multiple devices to create one pool:
# zpool create my_pool raidz /dev/sdb /dev/sdc # At least 2 disks
# zpool create my_pool raidz2 /dev/sdb /dev/sdc # At least 3 disks
# zpool create my_pool raidz3 /dev/sdb /dev/sdc # At least 4 disks
That also creates a ZFS level Raid called Raid-Z.
You may need a calculator for planning ZFS.
If you bought new disks and want to add the new disk to your existing pool, that's also possible!
zpool add my_pool /dev/new
# zfs set quota=new_size my_pool
Note that expanding ZFS pools takes some time, depending on the size and number of devices you are adding. During the expansion of the pool, you can still use the data in the pool, but you may experience a performance decrease.
Managing Sets
To create and mount:
# zfs create pool/set
# zfs set mountpoint=/test pool/set
To delete a set:
# zfs destroy -r pool/set
Features
Data dedup
You can use the deduplication (dedup) property to remove redundant data from your ZFS file systems. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored, and common components are shared between files.
Pool level
# zfs get dedup pool
# zfs set dedup=on pool
Or set level:
# zfs get dedup pool/set
# zfs set dedup=on pool/set
Use zpool list
to check the dedup ratio.
You may also want to check the total size may change:
root@lab:/my_pool# df -Th
Filesystem Type Size Used Avail Use% Mounted on
my_pool zfs 18G 3.0G 15G 18% /my_pool
Right here,
(Right here, '=' means equals not assignment)
Available space left = ZFS total storage capacity - Raw used space
Available space left = Logical usable storage capacity - Logical used space
Logical usable storage capacity = Available space left + Logical used space
Logical usable storage capacity = ZFS total storage capacity - Raw used space + Logical used space
In the world of ZFS, if you enable data deduplication, you will find that the total space of the dataset from df command is dynamically changing.
- Total raw storage capacity: 48G
- Raid Level: RaidZ2
- ZFS usable storage capacity: 16G
- Logical used space: 3G
- Raw used space: 1G
- Dedup ratio: 3x
- Dedup ratio = Logical used space / Raw used space
- Logical usable storage capacity: Dynamic calculated: 18G
- Logical usable storage capacity = ZFS usable storage capacity + (Logical used space - Raw used space)
- Available space left: 15G
- Available space left = ZFS total storage capacity - Raw used space
- Available space left = Logical usable storage capacity - Logical used space
Snapshot
Create snapshot:
# zfs snapshot pool/set@name
List snapshots:
# zfs list -t snapshot
Rollback to snapshot:
# zfs rollback pool/set@name
Rollback is Dangerous! It may delete the status between current and the snapshot! Use clone instead:
# zfs clone pool/set@name pool/set2
That will clone the snapshot as a dataset.
Delete snapshot:
# zfs destroy pool/set@name
Compression
By default, compression is disabled:
# zfs get compression my_pool
To enable it:
# zfs set compression=lz4 my_pool
lz4
is a very fast compression algorithm that has very little CPU overhead, so it's a good choice if you want to maximize performance. It's also the default compression algorithm used by ZFS on most systems.
You can also use gzip
or zle
. Compare with lz4, gzip is slower but has a higher compression ratio, while zle is faster but has a lower compression ratio.
# zfs set compression=gzip my_pool
# zfs set compression=zle my_pool
The impact of enabling lz4 on performance needs to be analyzed on a case-by-case basis. Enabling lz4 is like making some disk reading tasks simpler: just read the compressed data and have the CPU decompress it. The problem is that large-scale decompression may consume CPU performance, although it takes up very little. Therefore, lz4 can significantly improve performance in situations where the CPU is strong and there is a lot of redundant disk data. However, lz4 may not necessarily improve performance when the CPU is weak and almost all disk data is already compressed.
To query current compress ratio:
# zfs get compressratio my_pool
Caching
ZFS has a built-in caching mechanism called Adaptive Replacement Cache (ARC) to improve read performance. ARC is a dynamic cache that automatically adjusts its size based on the system's memory usage. It stores frequently accessed data in memory so that the system can access it quickly.
ZFS also supports the use of separate caching devices, such as solid-state drives (SSDs), to further improve performance. These caching devices are called ZFS Intent Log (ZIL) and ZFS Cache (L2ARC). ZIL is used to improve the performance of synchronous writes, while L2ARC is used to improve the performance of read operations.
Using separate caching devices can significantly improve performance in situations where system memory is limited or the workload is particularly read-intensive. However, using caching devices may also increase the cost of the storage system, so the benefits and costs need to be considered before implementing them.
For exmaple, to use two Samsung NVME solid-state drives as ZFS cache devices, which can significantly improve read and write performance:
Firstly, you need to allocate these two NVME solid-state drives to ZFS Intent Log (ZIL) and Level 2 Adaptive Replacement Cache (L2ARC) respectively.
To allocate an NVME solid-state drive to ZIL, use the following command:
# zpool add my_pool log nvme0n1
Here, my_pool
is your ZFS pool name, and nvme0n1
is your NVME solid-state drive device name.
To allocate an NVME solid-state drive to L2ARC, use the following command:
# zpool add my_pool cache nvme1n1
Here, my_pool
is your ZFS pool name, and nvme1n1
is your NVME solid-state drive device name.
After allocating the NVME solid-state drive to ZIL and L2ARC, you can use the following command to view the status of the ZFS pool:
# zpool status my_pool
Usually, for a 16GB pool, the suggested ZIL size is 4GB, and the suggested L2ARC size is 4GB.
To check details:
# cat /proc/spl/kstat/zfs/arcstats
RAID
ZFS supports multiple RAID levels, including RAID 0, RAID 1, RAID 5(Z), RAID 6(Z2), and RAID 10. These RAID levels can be used to improve performance and/or provide data redundancy.
To create a RAID 0 pool, use the following command:
# zpool create my_pool /dev/sda /dev/sdb
Here, my_pool
is your ZFS pool name, and /dev/sda
and /dev/sdb
are your hard drive device names.
To create a RAID 1 pool, use the following command:
# zpool create my_pool mirror /dev/sda /dev/sdb
Here, my_pool
is your ZFS pool name, and /dev/sda
and /dev/sdb
are your hard drive device names.
To create a RAID 5 pool, use the following command:
# zpool create my_pool raidz /dev/sda /dev/sdb /dev/sdc
Here, my_pool
is your ZFS pool name, and /dev/sda
, /dev/sdb
, and /dev/sdc
are your hard drive device names.
To create a RAID 6 pool, use the following command:
# zpool create my_pool raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd
Here, my_pool
is your ZFS pool name, and /dev/sda
, /dev/sdb
, /dev/sdc
, and /dev/sdd
are your hard drive device names.
Scrubbing
ZFS can periodically scan the storage devices in a pool to detect and repair data corruption. This helps to ensure that data is not corrupted over time.
To start a scrub, use the following command:
# zpool scrub my_pool
Here, my_pool
is your ZFS pool name.
To view the status of a scrub, use the following command:
# zpool status my_pool
To stop a scrub, use the following command:
# zpool scrub -s my_pool
Integrity
ZFS uses checksums to ensure data integrity. When reading data from a storage device, ZFS verifies the checksum of the data to ensure that it has not been corrupted. If the checksum does not match, ZFS will attempt to repair the data by reading it from another device in the pool.
To enable checksums, use the following command: (Enabled by default)
# zfs set checksum=on my_pool
My configuration example
Here is my configuration example:
# Create a pool with 12 disks
sudo zpool create -o ashift=12 pool raidz2 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120 \
/dev/disk/by-id/ata-TOSHIBA_HDWD120
# Set some properties
sudo zfs set compression=lz4 recordsize=1M xattr=sa dnodesize=auto pool
# Create a dataset
sudo zfs create -o mountpoint=/mnt/pool pool/data
# Add cache and log
sudo zpool add pool cache nvme0n1
sudo zpool add pool log nvme1n1
# Disable sync and atime for performance
sudo zfs set sync=disabled pool
sudo zfs set atime=off pool
This blog post does a great job of explaining how to use the ZFS file system. The author clearly outlines the key features of ZFS, such as copy-on-write, redirect-on-write, deduplication, and snapshots, and explains why these features make ZFS a powerful and reliable choice for large-scale storage and data management scenarios.
The author's explanation of how storage pools and datasets work in ZFS is particularly useful. The use of concrete examples, such as creating a dataset called "photos" and mapping it to a directory, helps to clarify these concepts and makes the post accessible to readers who may not be familiar with file systems.
The blog post also provides detailed instructions on how to install and use ZFS, including how to create and manage storage pools and datasets, how to enable deduplication, and how to create, list, rollback, clone, and delete snapshots. These instructions are clear and easy to follow, and the inclusion of command line examples is very helpful.
However, the blog post could be improved by providing more context and explanation for some of the commands. For example, the author mentions that creating a pool with multiple devices creates a ZFS level Raid called Raid-Z, but does not explain what Raid-Z is or why a user might want to use it. Similarly, the author warns that rolling back to a snapshot is dangerous and suggests using clone instead, but does not explain why this is the case or what the potential risks are.
The author could also consider adding a section on troubleshooting common issues with ZFS. This would make the blog post even more useful for readers who are new to ZFS and may encounter problems when trying to use it for the first time.
Overall, this is a very informative and useful blog post that does an excellent job of explaining how to use ZFS. With a few minor improvements, it could be an invaluable resource for anyone interested in large-scale storage and data management.