Kubernetes Chapters introduction

Over a decade I am writing blogs in this space or other. However recently or the last couple of years I was struggling to find some time to write or blog about. One of the main reasons for this absence was the busy schedule of life both ways, another reason was at times I think the topic which I am planning to write maybe a simple or already there are enough details about this. But the reality is that, both of these never going to end. With all that, today I have taken a decision that I will be blogging or writing something in this space regardless of its complexity or my schedule. In short, more articles are going to hit this space. I will try to put it under some parent category and make it as a series. Kubernetes is one of the chapters I am planning to write immediately.

Let’s see how it goes.

One more ceph-csi release ? Yeah, v2.1.0 is here!

The Ceph-CSI team is excited that it has reached the next milestone with the release of v2.1.0! [1] This is another great release with many improvements for Ceph and CSI integration to use in production with Kubernetes/openshift clusters. Of all the many features and bug fixes here are just a few of the highlights.

Release Issue # https://github.com/ceph/ceph-csi/issues/816


# Changelog or Highlights:

## Features:
Add support for rbd static PVC
Move cephfs subvolume support from `Alpha` to `Beta`.
Added support for rbd topology-based provisioning.
Support externally managed configmap.
Updated Base image to ceph Octopus
Added csiImageKey to keep track of the image name in RADOS omap
Added E2E for helm charts
...

## Enhancements:
Implement CreateVolume with go-ceph which boosts performance.
Migrated from `dep` to `go modules`
Updated Kubernetes version to v1.18.0
Updated golang version to 1.13.9
Updated Kubernetes sidecar containers to latest version
E2E: Add Ability to test with non root user
...

## Bug Fixes:
Log an error message if cephfs mounter fails during init
Aligned with klog standards for logging
Added support in to run E2E in a different namespace
Removed cache functionality for cephfs plugin restart
rbd: fallback to inline image deletion if adding it as a task fails
code cleanup for static errors and unwanted code blocks
Fix mountoption issue in rbd
travis: re-enable running on arm64
...

## Deprecated:
GRPC metrics in cephcsi

## Documentation:
Added Document to cleanup stale resources
Updated ceph-csi support matrix
dev-guide: add reference to required go-ceph dependencies
Update upgrade doc for node hang issue
....

Many other bug fixes, code improvements, README updates are also part of this release. The container image is tagged with “v2.1.0” and its downloadable by #docker pull quay.io/cephcsi/cephcsi:v2.1.0

Kudos to the Ceph CSI community for all the hard work to reach this critical milestone!
The Ceph-CSI project ( https://github.com/ceph/ceph-csi/), as well as its thriving community, has continued to grow and we are happy to share that, this is our 7th release since Jul 12, 2019!!

Happy Hacking!

Release Issue #
https://github.com/ceph/ceph-csi/issues/806
[1] https://github.com/ceph/ceph-csi/releases/tag/v2.1.0

Ceph-CSI v2.0.0 released with multi arch support, encryption, expansion..etc!

We are excited to announce the second major release of ceph-CSI, v2.0.0 !!

In the last few months, ceph-csi community has been tirelessly working to improve the project with many new features, bug fixes, usability improvement..etc. Some of the important features of this release include the capability of resizing RBD and CephFS volumes on demand, Encryption with LUKS support for RBD PVCs, Multi arch support (ceph-csi Arm64 image), Compatibility with kube 1.17, Upgraded sidecar containers..etc. This release is not limited features – many critical bug fixes, documentation updates are also part of this release [1][2].

[1]https://github.com/ceph/ceph-csi/issues/557
[2]https://github.com/ceph/ceph-csi/releases/tag/v2.0.0

An excerpt from the changelog:


Added dynamic resize support for CephFS PVCs
Added dynamic resize support for RBD PVCs
Added encryption with LUKS support for RBD PVCs
Mutli arch support ( ceph-csi Arm64 image)
Upgrade documentation from v1.2.2 to v2.0.0
Updated code base to kube v1.17
leader election enabled in deployment
Added Version flag to cephcsi
Removed Kubernetes 1.13.x support with 2.0.0 release
CSI: run all containers as privileged in daemonset pods
Upgrade: csi-attacher sidecar from v1.2.0 to v2.1.0
Upgrade: csi-snapshotter sidecar from v1.2.1 to v1.2.2
Upgrade: csi-node-driver-registrar sidecar from v1.1.0 to v1.2.0
Upgrade: csi-resizer from sidecar v0.3.0 to v0.4.0
Update csi-provisioner from sidecar v1.3.0 to v1.4.0
Remove deprecated containerized flag in rbd
Discard umount error if the directory is not mounted
Use EmptyDir to store provisioner socket
Add ContentSource to the CreateVolume response
Rbd: only load nbd module if not available yet
Enhance scripts to deploy ceph cluster using rook
Add e2e tests for RBD resizer
Update minikube to latest released version
Update golangci-lint version to v1.21.0
Fix to use kubectl create not kubectl apply in the e2e
Add volume size roundoff for expand request
Add E2E for cephfs resize functionality
Add Documentation for PVC resize
Fix block resize issue in RBD
Add 13.0.0 Mimic supported version to the readme
update Metrics supported version in Readme
Remove hard-coded UpdateStrategy from templates
Add E2E for block PVC resize
Enable logging in E2E if the test fails
Enable Block E2E for rbd
Add ID-based logging for ExpandVolume
Validate rbd image name in NodeExpand

There are some features we deferred from this release to next. Reach out to us via https://github.com/ceph/ceph-csi/issues/806 if you would like to see any feature/bugfix/update as part of the next ceph csi release. The CSI community has been growing since our first release ( v1.0.0) and we are glad to share that, many new contributors have come together to make ceph csi a production-grade CSI driver with this release!

Thanks to all!

Ceph CSI v1.2.0 , v1.2.1 releases and so on

If you care about CSI and Ceph plugin, you would have noticed a massive improvement that is going on in the ceph CSI repo for last 4/5 months!! We started to engage this repo heavily to address upstream user issues, many many bug fixes, improvements, streamlining the communication in the community, release planning and rolling out releases, integration with Rook project (https://rook.io/) to make it as default storage driver in Rook…etc. With all these efforts, we are hearing lots of very positive feedback from community users.

We were able to roll out the very first release of CSI – v1.1.0 around 3 months back with many changes and solid code in the project. We were continuing our efforts to release CSI v1.2.0 series and at present we have rolled out v1.2.0 and v1.2.1 !!

I would like to list down some of the highlights of these releases here:

Ceph CSI v1.2.0 Release

Release Issue: https://github.com/ceph/ceph-csi/issues/393

Changelog or highlights:

*) Cephfs: Use ceph kernel client if kernel version >= 4.17
*) implement grpc metrics for ceph-csi
*) Add xfs fstype as default type in storageclass
*) Add support to use ceph manager rbd command to delete an image
*) e2e: correct log format in execCommandInPod()
*) Add 'gosec' to the static-checks
*) switch to cephfs, utils, and csicommon to new loging system
*) utility to trace backend volume from RBD pvc
*) Implement context-based logging
*) implement klog wrapper
*) unmap rbd image if connection timeout.
*) start controller or node server based on config
*) fix: Adds liveness sidecar to v1.14+ helm charts
*) Prometheus liveness probe sidecar
*) Wrap error if failed to fetch mon
*) provisioners: add reconfiguring of PID limit
*) Use "rbd device list" to list and find rbd images and their device paths
*) Update Unstage transaction to undo steps done in Stage
*) Move mounting staging instance to a sub-path within staging path
*) e2e: do not fail to delete resources when "resource not found"
*) remove post validation of rbd device

Many other bug fixes, code improvements, README updates are also part of this release.


CSI v1.2.1 Release.

Release Issue # https://github.com/ceph/ceph-csi/issues/600

*) Change the recommended/default FS for RBD to ext4
*) Use nodiscard option while formatting RBD devices.
*) Use provisioner socket while probing liveness.
*) Reject request if the operation is in progress
*) Fix pod termination issue due to stale mount after node plugin restart.

….etc

As you can see in the changelog, a great amount of features, bug fixes.. etc are part of v1.2 release series.
We are not stopping here, rather marching towards v1.2’s next minor release without much delay. You can track the release items from https://github.com/ceph/ceph-csi/issues/639 .

Kudos to the Ceph CSI community for all the hard work to reach this critical milestone.

I would like to summarize this article by mentioning that, `your participation is highly encouraged for future releases of Ceph CSI!`

Talk to us via GitHub issues/PRs or Slack https://cephcsi.slack.com/ or other channels.

Happy hacking.

Forcefully delete stuck rook-ceph namespace ?

I have seen scenarios where I or someone, who deployed `rook-ceph cluster` want to clean up or delete “rook-ceph” namespace which hosts most of the ceph cluster and rook operator pods. However, this deletion got stuck or IOW, some of the pods get into “terminating” state forever which is not good There are many threads or discussions in various forums on, how to tackle this scenario or how to solve this and delete all stuck pods. I would like to summarize the command worked for me or helped me to successfully delete the pods and thus namespace when I encountered this issue.

If you are in this scenario, please try this out!

[terminal]#kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p ‘{“metadata”:{“finalizers”: []}}’ –type=merge [/terminal]

More details on how we landed on this situation..etc can be found here:

https://github.com/rook/rook/issues/2668

If you are still unlucky, check the detail of this namespace like below and then remove the finalizer or problematic marker.


[root@ ceph]# kubectl get ns/rook-ceph -oyaml
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: "2021-03-19T10:06:57Z"
deletionTimestamp: "2021-03-22T11:38:35Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:phase: {}
manager: kubectl-create
operation: Update
time: "2021-03-19T10:06:57Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
.: {}
k:{"type":"NamespaceContentRemaining"}:
.: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"NamespaceDeletionContentFailure"}:
.: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"NamespaceDeletionDiscoveryFailure"}:
.: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"NamespaceDeletionGroupVersionParsingFailure"}:
.: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"NamespaceFinalizersRemaining"}:
.: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
manager: kube-controller-manager
operation: Update
time: "2021-03-22T11:38:40Z"
name: rook-ceph
resourceVersion: "1121184"
uid: b962942b-8f2d-4fac-96df-22ff18a77143
spec:
finalizers:
- kubernetes
status:
conditions:
- lastTransitionTime: "2021-03-22T11:38:40Z"
message: All resources successfully discovered
reason: ResourcesDiscovered
status: "False"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2021-03-22T11:38:40Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2021-03-22T11:39:40Z"
message: All content successfully deleted, may be waiting on finalization
reason: ContentDeleted
status: "False"
type: NamespaceDeletionContentFailure
- lastTransitionTime: "2021-03-22T11:38:40Z"
message: 'Some resources are remaining: cephfilesystems.ceph.rook.io has 1 resource
instances'
reason: SomeResourcesRemain
status: "True"
type: NamespaceContentRemaining
- lastTransitionTime: "2021-03-22T11:38:40Z"
message: 'Some content in the namespace has finalizers remaining: cephfilesystem.ceph.rook.io
in 1 resource instances'
reason: SomeFinalizersRemain
status: "True"
type: NamespaceFinalizersRemaining
phase: Terminating
[root@ ceph]# kubectl edit cephfilesystems.ceph.rook.io -n rook-ceph
cephfilesystem.ceph.rook.io/myfs edited

Ceph CSI v1.1.0 Released!!

Ceph CSI team is excited that it has reached a huge milestone with the release of v1.1.0!

https://github.com/ceph/ceph-csi/releases/tag/v1.1.0

Kudos to the Ceph CSI community for all the hard work to reach this critical milestone. This is our first official release ( tracked @ https://github.com/ceph/ceph-csi/issues/353 ) and it is out on 12-Jul-2019. This is a huge release with many improvements in CSI based volume provisioning by making use of the latest Ceph release ( Nautilus ) for its use in production with Kubernetes clusters. One of the main highlights of this release is ceph subvolume based volume provisioning and deletion.

Highlights of this release:

*) CephFS subvolume/manager based volume provisioning and deletion.
*) E2E test support for PVC creation, App pod mounting.etc.
*) CSI spec v1.1 support
*) Added support for kube v1.15.
*) Configuration store change from configmap to rados omap
*) Mount options support for CephFS and RBD volumes
*) Move locks to more granular locking than CPU count based
*) Rbd support for ReadWriteMany PVCs for block mode
*) Unary plugin code for ‘CephFS and RBD’ drivers.
*) Driver name updated to CSI spec standard.
*) helm chart updates.
*) sidecar updates to the latest available.
*) RBAC corrections and aggregated role addition.
*) Lock protection for create,delete volume ..etc operations
*) Added support for external snapshottter.
*) Added support for CSIDriver CRD.
*) Support matrix table availability.
*) Many linter fixes and error code fixes.
*) Removal of dead code paths.
*) StripSecretInArgs in pkg/util.
*) Migration to klog from glog
……….

Many other bug fixes, code improvements, README updates are also part of this release. The container image is tagged with “v1.1.0” and its downloadable by docker pull quay.io/cephcsi/cephcsi:v1.1.0

We have also updated the support matrix for better notification of available CSI features and its status in upstream.

https://github.com/ceph/ceph-csi#Support-Matrix

We would like to thank the Rook team for unblocking the CSI project at various stages!

We are not stopping here, but moves forward at a good pace to catch up with our next ‘feature’ rich release tracked at https://github.com/ceph/ceph-csi/issues/393. If you would like to see some features or get some bug fixes done in the next release, please help us by mentioning it in the same release tracker.

We are also kickstarting upstream bug triage call from next week, so please be part of it. More details about this call is available @ https://github.com/ceph/ceph-csi/issues/463

Happy Hacking!

PS/NOTE: This release needs the latest Ceph Nautilus cluster to support cephfs subvolume provisioning and this version of the cluster is made available if you deploy CSI with Rook Master.

Ceph CSI driver deployment in a kubernetes cluster

I have recently published a blog on how to deploy Ceph Cluster in a kube setup. If you don’t have this cluster up and running please refer this article. For this attempt we need below components/software deployed successfully in a setup. Kubernetes Ceph cluster Ceph CSI driver The first two deployments ( Kubernetes cluster and …

Read more

Deploy a ceph cluster using Rook (rook.io) in kubernetes

[Updated on 20-Jun-2020: Many changes in Rook Ceph in previous releases, so revisiting this blog article to accomodate the changes based on a ping in the slack 🙂 ] In this article we will talk about, how to deploy Ceph ( a software-defined storage) cluster using a Kubernetes operator called ‘rook’. Before we get into …

Read more

“dep: WARNING: Unknown field in manifest: prune “

Did you get the above error when you ran “dep ensure” or any similar command when using go language dependency tool- “dep”? ( https://github.com/golang/dep/). If yes, just look at the version of “dep” in your system. [root@localhost ]# dep version dep: version : v0.3.1 build date : 2017-09-19 git hash : 83789e2 go version : …

Read more

Gluster CSI driver 1.0.0 (pre) release is out!!

We are pleased to announce v1.0.0 (pre) release of GlusterFS CSI driver. The release source code can be downloaded from github.com/gluster/gluster-csi-driver/archive/1.0.0-pre.0.tar.gz. Compared to the previous beta version of the driver, this release makes Gluster CSI driver fully compatible with CSI spec v1.0.0 and Kubernetes release 1.13 ( kubernetes.io/blog/2018/12/03/kubernetes-1-13-release-announcement/ ) The CSI driver deployment can be …

Read more