In the last couple of weeks due to a bug that was fixed, I had split brain on a host. Every time, I would place a particular host into maintenance mode it would begin to move VMs to other hosts in the cluster. However, I would get a couple VMs that would flp between hosts in the vCenter GUI. I was able to confirm the VMs were running on other hosts but this host still could not enter maintenance mode as a result of split brain. The fix, kill the VM processes running on the split brain host allowing it to enter maintenance mode to be rebooted.
How to:
To get the World ID of a VM:
#esxcli vm process list
To kill the VM or it’s processes running on a host:
#esxcli vm process kill –type= [soft,hard,force] –world-id= WorldNumber
*Soft – attempts to shutdown the VM softly – preferred method
**Hard – it is an immediate shutdown of the VM
***Force – hard kill of the VM – should use if only option left
Summary:
As you can imagine killing the process of a VM in production is never a great thing to do. Sometimes, you are left with no choice. I hope if you are ever putting a host into maintenance mode to reboot that you have a change control in place and can bounce VMs if needed. As always, I hope y’all found this article useful.