Sunday, January 29, 2023

Deleting a Proxmox cluster

I had put my rack servers (Dell Poweredge R720 and R430) in a cluster together with a Raspberry Pi 4 to give quorom so that I can leave the R430 off when not needed. While there is no problem, I really didn't need to use the features of a high availability cluster, since I don't need to move my VMs around the nodes or have a VM run on another node when a node is down. So I decided to delete the cluster.

Before proceeding, note that deleting a Proxmox cluster can cause irreparable damage if not done properly. I actually have my VMs backed up to another disk, so that if anything happens, I can reinstall Proxmox and restore those backups.

As reference, I watched this YouTube video:

I also use the official Proxmox documentation as a guide.

These are the steps I used.

First, on the main cluster node, which is the R720 for me, I ran
pvecm nodes
to see the names of the nodes.
 
Then, I used the following commands to remove the nodes, replacing NODENAME with the name of the first node (not the main one!) to delete. 
pvecm delnode NODENAME
rm -r /etc/pve/nodes/NODENAME

If pvecm delnode NODENAME fails, work around by pvecm expected 1, then try again.

Next, SSH into the node being deleted and run the following commands to stop the cluster on that node.
systemctl stop pve-cluster
systemctl stop corosync
Next, restart the cluster on that node in standalone mode.
pmxcfs -l
Then, delete the corosync configuration files.
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
Then, restart the service again as a normal service.
killall pmxcfs
systemctl start pve-cluster

Remove the remaining cluster-related files on this node.
rm /var/lib/corosync/*
This separates the node from the cluster.
This node will still have files from the other nodes in the cluster, so you can remove them too.
rm -r /etc/pve/nodes/NODENAME
Replace NODENAME with the name of the nodes, other than its own node name. In my case, I started by removing the R430 (nodename pve430) from the cluster, so for this command, I ran the following
rm -r /etc/pve/nodes/pve730 (for the R720)
rm -r /etc/pve/nodes/pverpi4 (for the Raspberry Pi 4)
 
This removed the R430 from the cluster.
I then repeated the above steps to remove the Raspberry Pi 4 from the cluster.

Once that has been done, my R430 and Raspberry Pi 4 are no longer in cluster configuration.

Finally, to delete the cluster on the main node, I ran the following commands.
First, stop the running cluster.
systemctl stop pve-cluster
systemctl stop corosync

Then, force the node to run in local mode.
pmxcfs -l
Then, delete all the cluster configuration files.
rm -f /etc/pve/cluster.conf /etc/pve/corosync.conf
rm -f /etc/cluster/cluster.conf /etc/corosync/corosync.conf
rm /var/lib/pve-cluster/.pmxcfs.lockfile
rm -r /etc/corosync/*

Next, restart empty cluster service.
killall pmxcfs
systemctl start pve-cluster
Remove the remaining cluster-related files on this node.
rm /var/lib/corosync/*
This separates the main node from the cluster.
This node will still have files from the other nodes in the cluster, so you can remove them too.
rm -r /etc/pve/nodes/NODENAME
In my case, my main node is the R720 (nodename pve720), so for this command, I ran the following
rm -r /etc/pve/nodes/pve430 (for the R430)
rm -r /etc/pve/nodes/pverpi4 (for the Raspberry Pi 4)
 
There is no need to actually reboot, but I did so anyway just to make sure everything can run.

As you can see, no more cluster! 

Again, remember, deleting a Proxmox cluster can cause irreparable damage if not done properly. Follow these steps at your own risk!! I will not be responsible for any damages or loss arising from anyone following these steps. I am just sharing what I did. It worked for me, but it may not work for you. I recommend reading the official documentation, and watching the video, to get an idea of what you are trying to do, before attempting to actually do anything. Plus backup backup backup!

No comments: