dbi Blog

This is the third post in this little series about bcachefs. The first post was all about the basics while the second post introduced bcachefs over multiple devices. What we did not discuss so far is, what bcache has to offer when it comes to mirroring. By default bcachefs stripes your data across all the devices in the file systems. As devices do not need to be of the same size, the one(s) with the most free space will be favored. The goal of this is, that all devices fill up at the same pace. This usually does not protect you from a failure of a device, except you lose a device which does not contain any data.

To address this, bcachefs comes with a concept which is called “replication”. You can think of replication like a RAID 1/10, which means mirroring and striping. Given the list of available devices in the setup we’re currently using we have enough devices to play with this:

tumbleweed:~ $ lsblk | grep -w "4G"
└─vda3 254:3    0  1.4G  0 part [SWAP]
vdb    254:16   0    4G  0 disk 
vdc    254:32   0    4G  0 disk 
vdd    254:48   0    4G  0 disk 
vde    254:64   0    4G  0 disk 
vdf    254:80   0    4G  0 disk 
vdg    254:96   0    4G  0 disk

Let’s assume we want to have a 4gb file system but we also want to have the data mirrored to another device, just in case we lose one. With bcachefs this can easily be done like this:

tumbleweed:~ $ bcachefs format --force --replicas=2 /dev/vdb /dev/vdc
tumbleweed:~ $ mount -t bcachefs /dev/vdb:/dev/vdc /mnt/dummy/

As data is now mirrored this should result in a file system of around 4gb, instead of 8gb:

tumbleweed:~ $ df -h | grep dummy
/dev/vdb:/dev/vdc  7.3G  4.0M  7.2G   1% /mnt/dummy

It does not, so what could be the reason for this? Looking at the usage of the file system we see this:

tumbleweed:~ $ bcachefs fs usage /mnt/dummy/
Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073
Size:                     7902739968
Used:                       78118912
Online reserved:                   0

Data type       Required/total  Durability    Devices
btree:          1/2             2             [vdb vdc]            4194304

(no label) (device 0):           vdb              rw
                                data         buckets    fragmented
  free:                   4255907840           16235
  sb:                        3149824              13        258048
  journal:                  33554432             128
  btree:                     2097152               8
  user:                            0               0
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  capacity:               4294967296           16384

(no label) (device 1):           vdc              rw
                                data         buckets    fragmented
  free:                   4255907840           16235
  sb:                        3149824              13        258048
  journal:                  33554432             128
  btree:                     2097152               8
  user:                            0               0
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  capacity:               4294967296           16384

As we currently do not have any user data in this file system, let’s write a 100MB file into it and check again how this looks like from a usage perspective:

tumbleweed:~ $ dd if=/dev/zero of=/mnt/dummy/dummy bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0294275 s, 3.6 GB/s
tumbleweed:~ $ ls -lha /mnt/dummy/
total 100M
drwxr-xr-x 3 root root    0 Apr 17 17:04 .
dr-xr-xr-x 1 root root   10 Apr 17 10:41 ..
-rw-r--r-- 1 root root 100M Apr 17 17:05 dummy
drwx------ 2 root root    0 Apr 17 16:54 lost+found
tumbleweed:~ $ df -h | grep dummy
/dev/vdb:/dev/vdc  7.3G  207M  7.0G   3% /mnt/dummy

So instead of using 100MB on disk, we’re actually using 200MB, so this makes sense again, you just need to be aware of how the numbers are presented and how you come to this disk usage. Anyway, let’s have a look at the disk usage as the “bcachefs” utility reports it once more:

tumbleweed:~ $ bcachefs fs usage /mnt/dummy/
Filesystem: d8a3d289-bb0f-4df0-b15c-7bb4ada51073
Size:                     7902739968
Used:                      290979840
Online reserved:                   0

Data type       Required/total  Durability    Devices
btree:          1/2             2             [vdb vdc]            7340032
user:           1/2             2             [vdb vdc]          209715200

(no label) (device 0):           vdb              rw
                                data         buckets    fragmented
  free:                   4148953088           15827
  sb:                        3149824              13        258048
  journal:                  33554432             128
  btree:                     3670016              14
  user:                    104857600             400
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:               524288               2
  capacity:               4294967296           16384

(no label) (device 1):           vdc              rw
                                data         buckets    fragmented
  free:                   4148953088           15827
  sb:                        3149824              13        258048
  journal:                  33554432             128
  btree:                     3670016              14
  user:                    104857600             400
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:               524288               2
  capacity:               4294967296           16384

This is telling us more or less the same: We have around 100MB of user data on each of the devices, and this 100MB of user data are spread across 400 buckets. A bucket is 512KiB bytes per default, which you can read out of the super block:

tumbleweed:~ $ bcachefs show-super /dev/vdb | grep "Bucket size"
  Bucket size:                              256 KiB
  Bucket size:                              256 KiB

If you do the math: 104857600/(512*1024) gives 200 buckets, but as we replicate every bucket we have 400. Same story here, you need to know where this 400 buckets come from to make any sense out of it.

In the next post we’ll look at device labels and targets.

Post Views: 566

Another file system for Linux: bcachefs (3) – Mirroring/Replicas

Daniel Westermann

Leave a Reply:

Related blog articles

Daniel Westermann

Leave a Reply:

Related blog articles

SQL Server: how do monitor SQL Dumps?

CloudNativePG – Benchmarking

Oracle 21c: Attention Log – Useful or Superflous?

Oracle 23ai – True Cache