This is the third post in this little series about bcachefs. The first post was all about the basics while the second post introduced bcachefs over multiple devices. What we did not discuss so far is, what bcache has to offer when it comes to mirroring. By default bcachefs stripes your data across all the devices in the file systems. As devices do not need to be of the same size, the one(s) with the most free space will be favored. The goal of this is, that all devices fill up at the same pace. This usually does not protect you from a failure of a device, except you lose a device which does not contain any data.
To address this, bcachefs comes with a concept which is called “replication”. You can think of replication like a RAID 1/10, which means mirroring and striping. Given the list of available devices in the setup we’re currently using we have enough devices to play with this:
1 2 3 4 5 6 7 8 | tumbleweed:~ $ lsblk | grep -w "4G" └─vda3 254:3 0 1.4G 0 part [SWAP] vdb 254:16 0 4G 0 disk vdc 254:32 0 4G 0 disk vdd 254:48 0 4G 0 disk vde 254:64 0 4G 0 disk vdf 254:80 0 4G 0 disk vdg 254:96 0 4G 0 disk |
Let’s assume we want to have a 4gb file system but we also want to have the data mirrored to another device, just in case we lose one. With bcachefs this can easily be done like this:
1 2 | tumbleweed:~ $ bcachefs format --force --replicas=2 /dev/vdb /dev/vdc tumbleweed:~ $ mount -t bcachefs /dev/vdb : /dev/vdc /mnt/dummy/ |
As data is now mirrored this should result in a file system of around 4gb, instead of 8gb:
1 2 | tumbleweed:~ $ df -h | grep dummy /dev/vdb : /dev/vdc 7.3G 4.0M 7.2G 1% /mnt/dummy |
It does not, so what could be the reason for this? Looking at the usage of the file system we see this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | tumbleweed:~ $ bcachefs fs usage /mnt/dummy/ Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073 Size: 7902739968 Used: 78118912 Online reserved: 0 Data type Required /total Durability Devices btree: 1 /2 2 [vdb vdc] 4194304 (no label) (device 0): vdb rw data buckets fragmented free : 4255907840 16235 sb: 3149824 13 258048 journal: 33554432 128 btree: 2097152 8 user: 0 0 cached: 0 0 parity: 0 0 stripe: 0 0 need_gc_gens: 0 0 need_discard: 0 0 capacity: 4294967296 16384 (no label) (device 1): vdc rw data buckets fragmented free : 4255907840 16235 sb: 3149824 13 258048 journal: 33554432 128 btree: 2097152 8 user: 0 0 cached: 0 0 parity: 0 0 stripe: 0 0 need_gc_gens: 0 0 need_discard: 0 0 capacity: 4294967296 16384 |
As we currently do not have any user data in this file system, let’s write a 100MB file into it and check again how this looks like from a usage perspective:
1 2 3 4 5 6 7 8 9 10 11 12 | tumbleweed:~ $ dd if = /dev/zero of= /mnt/dummy/dummy bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0294275 s, 3.6 GB /s tumbleweed:~ $ ls -lha /mnt/dummy/ total 100M drwxr-xr-x 3 root root 0 Apr 17 17:04 . dr-xr-xr-x 1 root root 10 Apr 17 10:41 .. -rw-r--r-- 1 root root 100M Apr 17 17:05 dummy drwx------ 2 root root 0 Apr 17 16:54 lost+found tumbleweed:~ $ df -h | grep dummy /dev/vdb : /dev/vdc 7.3G 207M 7.0G 3% /mnt/dummy |
So instead of using 100MB on disk, we’re actually using 200MB, so this makes sense again, you just need to be aware of how the numbers are presented and how you come to this disk usage. Anyway, let’s have a look at the disk usage as the “bcachefs” utility reports it once more:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | tumbleweed:~ $ bcachefs fs usage /mnt/dummy/ Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073 Size: 7902739968 Used: 290979840 Online reserved: 0 Data type Required /total Durability Devices btree: 1 /2 2 [vdb vdc] 7340032 user: 1 /2 2 [vdb vdc] 209715200 (no label) (device 0): vdb rw data buckets fragmented free : 4148953088 15827 sb: 3149824 13 258048 journal: 33554432 128 btree: 3670016 14 user: 104857600 400 cached: 0 0 parity: 0 0 stripe: 0 0 need_gc_gens: 0 0 need_discard: 524288 2 capacity: 4294967296 16384 (no label) (device 1): vdc rw data buckets fragmented free : 4148953088 15827 sb: 3149824 13 258048 journal: 33554432 128 btree: 3670016 14 user: 104857600 400 cached: 0 0 parity: 0 0 stripe: 0 0 need_gc_gens: 0 0 need_discard: 524288 2 capacity: 4294967296 16384 |
This is telling us more or less the same: We have around 100MB of user data on each of the devices, and this 100MB of user data are spread across 400 buckets. A bucket is 512KiB bytes per default, which you can read out of the super block:
1 2 3 | tumbleweed:~ $ bcachefs show-super /dev/vdb | grep "Bucket size" Bucket size: 256 KiB Bucket size: 256 KiB |
If you do the math: 104857600/(512*1024) gives 200 buckets, but as we replicate every bucket we have 400. Same story here, you need to know where this 400 buckets come from to make any sense out of it.
In the next post we’ll look at device labels and targets.