Ceph CSI: “XFS Superblock has unknown read-only..” Or ” wrong fs type, bad …. on /dev/rbd., missing codepage or ..”

We announced ceph csi v2.1.0 release around 3 weeks back with many bug fixes and features. More details about this release can be found here https://github.com/ceph/ceph-csi/releases/tag/v2.1.0 and in this blog https://www.humblec.com/one-more-ceph-csi-release-yeah-v2-1-0-is-here/. From the CSI/Rook community interactions we know many folks have updated to this version in the last couple of weeks.

However, unfortunately, the community found a couple of issues with their app pod PVC mounting (XFS based) with this CSI version in their Kubernetes/openshift cluster.

There are mainly 2 errors/issues encountered in setups:

1) XFS: wrong fs type, bad option, bad superblock on /dev/rbd4, missing codepage or helper program, or other error
2) XFS: Superblock has unknown read-only compatible features (0x4) enabled

As you can see above, both of these issues are on XFS mounting. Ceph CSI plugin make use of `mount.xfs` binary at time of volume mounting and its part of “xfsprogs” package shipped by the CSI container. To understand this issue better, lets first see the change:

[terminal]
ceph v2.1.0 release: mkfs.xfs version 4.5.0

ceph v2.1.0 release: mkfs.xfs version 5.0.0
[/terminal]

This change happened when we updated the Ceph base image in our containers from v14.2 to v15 in ceph CSI version 2.1.0 compared to 2.0.0

[terminal]
ceph v2.1.0
BASE_IMAGE=ceph/ceph:v14.2

ceph v2.1.0
BASE_IMAGE=ceph/ceph:v15
[/terminal]

To give some more context about these issues:

The first one is more generic:

XFS: wrong fs type, bad option, bad superblock on /dev/rbd4, missing codepage or helper program, or other error

That said, this is seen when you have multiple PVCs with the same XFS UUID ( filesystem UUID) and the PVC mount fails when its attached to a pod.

The reason to have multiple PVCs with the same UUID arises when you have a snapshotted/cloned volume. Even though ceph CSI snapshot/clone support is still in `Alpha` state we have proactively fixed this issue [1].

The second issue:

XFS: Superblock has unknown read-only compatible features (0x4) enabled

pop up when the cluster nodes are running RHEL 7 based kernels ( for ex: redhat kernel 3.10.0-957 el7). The `mkfs.xfs` binary in the new CSI container image is not really compatible with this kernel version and you will encounter ‘unknown read-only compatible features (0x4) read-only error at the time of mounting’. To avoid this error the `mkfs.xfs` call would need to include `-m reflink=0` which disable the incompatible copy-on-write reflink support.

More details about these issues can be found at https://github.com/ceph/ceph-csi/issues/966

[1] Solution:

Both of these issues have been addressed with ceph CSI v2.1.1 which got released today https://github.com/ceph/ceph-csi/releases/tag/v2.1.1. The immediate fix what we made with this new release is, reverting to older `mkfs.xfs` version. The real fix is known and we will make it soon. But, till then we don’t want to leave our awesome community with broken setup.

We advise you to update to v2.1.1 if you are using `xfs` as your PVC `fsformat` in your kubernetes/openshift cluster.

Special thanks to below community users for reporting this issue and helping us with the various stages of this debugging process!

https://github.com/chandr20
https://github.com/iExalt
https://github.com/fiveclubs
https://github.com/volvicoasis

Happy hacking and talk to us via slack ( http://cephcsi.slack.com) or github https://github.com/ceph/ceph-csi/.

Digiprove sealCopyright secured by Digiprove © 2020 Humble Chirammal