This is the third post in this little series about bcachefs. The first post was all about the basics while the second post introduced bcachefs over multiple devices. What we did not discuss so far is, what bcache has to offer when it comes to mirroring. By default bcachefs stripes your data across all the devices in the file systems. As devices do not need to be of the same size, the one(s) with the most free space will be favored. The goal of this is, that all devices fill up at the same pace. This usually does not protect you from a failure of a device, except you lose a device which does not contain any data.
To address this, bcachefs comes with a concept which is called “replication”. You can think of replication like a RAID 1/10, which means mirroring and striping. Given the list of available devices in the setup we’re currently using we have enough devices to play with this:
tumbleweed:~ $ lsblk | grep -w "4G"
└─vda3 254:3 0 1.4G 0 part [SWAP]
vdb 254:16 0 4G 0 disk
vdc 254:32 0 4G 0 disk
vdd 254:48 0 4G 0 disk
vde 254:64 0 4G 0 disk
vdf 254:80 0 4G 0 disk
vdg 254:96 0 4G 0 disk
Let’s assume we want to have a 4gb file system but we also want to have the data mirrored to another device, just in case we lose one. With bcachefs this can easily be done like this:
tumbleweed:~ $ bcachefs format --force --replicas=2 /dev/vdb /dev/vdc
tumbleweed:~ $ mount -t bcachefs /dev/vdb:/dev/vdc /mnt/dummy/
As data is now mirrored this should result in a file system of around 4gb, instead of 8gb:
tumbleweed:~ $ df -h | grep dummy
/dev/vdb:/dev/vdc 7.3G 4.0M 7.2G 1% /mnt/dummy
It does not, so what could be the reason for this? Looking at the usage of the file system we see this:
tumbleweed:~ $ bcachefs fs usage /mnt/dummy/
Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073
Size: 7902739968
Used: 78118912
Online reserved: 0
Data type Required/total Durability Devices
btree: 1/2 2 [vdb vdc] 4194304
(no label) (device 0): vdb rw
data buckets fragmented
free: 4255907840 16235
sb: 3149824 13 258048
journal: 33554432 128
btree: 2097152 8
user: 0 0
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 4294967296 16384
(no label) (device 1): vdc rw
data buckets fragmented
free: 4255907840 16235
sb: 3149824 13 258048
journal: 33554432 128
btree: 2097152 8
user: 0 0
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 4294967296 16384
As we currently do not have any user data in this file system, let’s write a 100MB file into it and check again how this looks like from a usage perspective:
tumbleweed:~ $ dd if=/dev/zero of=/mnt/dummy/dummy bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0294275 s, 3.6 GB/s
tumbleweed:~ $ ls -lha /mnt/dummy/
total 100M
drwxr-xr-x 3 root root 0 Apr 17 17:04 .
dr-xr-xr-x 1 root root 10 Apr 17 10:41 ..
-rw-r--r-- 1 root root 100M Apr 17 17:05 dummy
drwx------ 2 root root 0 Apr 17 16:54 lost+found
tumbleweed:~ $ df -h | grep dummy
/dev/vdb:/dev/vdc 7.3G 207M 7.0G 3% /mnt/dummy
So instead of using 100MB on disk, we’re actually using 200MB, so this makes sense again, you just need to be aware of how the numbers are presented and how you come to this disk usage. Anyway, let’s have a look at the disk usage as the “bcachefs” utility reports it once more:
tumbleweed:~ $ bcachefs fs usage /mnt/dummy/
Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073
Size: 7902739968
Used: 290979840
Online reserved: 0
Data type Required/total Durability Devices
btree: 1/2 2 [vdb vdc] 7340032
user: 1/2 2 [vdb vdc] 209715200
(no label) (device 0): vdb rw
data buckets fragmented
free: 4148953088 15827
sb: 3149824 13 258048
journal: 33554432 128
btree: 3670016 14
user: 104857600 400
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 524288 2
capacity: 4294967296 16384
(no label) (device 1): vdc rw
data buckets fragmented
free: 4148953088 15827
sb: 3149824 13 258048
journal: 33554432 128
btree: 3670016 14
user: 104857600 400
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 524288 2
capacity: 4294967296 16384
This is telling us more or less the same: We have around 100MB of user data on each of the devices, and this 100MB of user data are spread across 400 buckets. A bucket is 512KiB bytes per default, which you can read out of the super block:
tumbleweed:~ $ bcachefs show-super /dev/vdb | grep "Bucket size"
Bucket size: 256 KiB
Bucket size: 256 KiB
If you do the math: 104857600/(512*1024) gives 200 buckets, but as we replicate every bucket we have 400. Same story here, you need to know where this 400 buckets come from to make any sense out of it.
In the next post we’ll look at device labels and targets.