Building a Self-Healing Nutanix Environment with Ansible

Introduction

Self-healing infrastructure reduces downtime and manual intervention. With Ansible, Nutanix admins can detect VM or system issues and automatically trigger recovery actions, like powering on VMs, attaching NICs, or restoring from snapshots. This playbook outlines a closed-loop health monitor and repair engine.


My Personal Repository on GitHub

Nutanix Repository on GitHub


Diagram: Self-Healing Automation Flow


Use Case

  • Detect powered-off critical VMs
  • Reattach NICs to isolated guests
  • Roll back to last known good snapshot

Playbook: self_heal_vm.yml

- name: Self-Healing Nutanix VM Automation
hosts: localhost
gather_facts: false
collections:
- nutanix.ncp
vars_files:
- nutanix_credentials.yml
vars:
monitored_vms:
- name: web01
expected_power: "on"
heal_action: "reboot"
- name: db01
expected_power: "on"
heal_action: "restore_snapshot"
tasks:

- name: Fetch VM states
loop: "{{ monitored_vms }}"
loop_control:
loop_var: vm
nutanix.ncp.vms_info:
name: "{{ vm.name }}"
register: vm_status

- name: Heal powered-off VMs
when: item.vms[0].power_state != vm.expected_power
loop: "{{ vm_status.results }}"
loop_control:
loop_var: item
block:
- name: Trigger VM reboot
when: item.item.heal_action == "reboot"
nutanix.ncp.vms:
name: "{{ item.item.name }}"
state: present
power_state: restart
cluster_name: "prod-cluster"

- name: Restore from snapshot (fallback)
when: item.item.heal_action == "restore_snapshot"
debug:
msg: "TODO: Restore snapshot for {{ item.item.name }} – Add logic here."

Optional Extensions

  • Add NIC state detection
  • Roll snapshot logic into nutanix.ncp.vm_snapshots
  • Notify teams of triggered remediations

Scheduling It

*/10 * * * * ansible-playbook self_heal_vm.yml --ask-vault-pass -i inventory.yml

Runs every 10 minutes to validate and repair VM state.


Summary

This playbook gives your Nutanix cluster a resilience layer. Use it to automatically recover VMs from failures, reduce incident response time, and support 24×7 environments without manual oversight.

External Documentation:

Leave a Reply

Discover more from Digital Thought Disruption

Subscribe now to keep reading and get access to the full archive.

Continue reading