Introduction
Self-healing infrastructure reduces downtime and manual intervention. With Ansible, Nutanix admins can detect VM or system issues and automatically trigger recovery actions, like powering on VMs, attaching NICs, or restoring from snapshots. This playbook outlines a closed-loop health monitor and repair engine.
My Personal Repository on GitHub
Diagram: Self-Healing Automation Flow

Use Case
- Detect powered-off critical VMs
- Reattach NICs to isolated guests
- Roll back to last known good snapshot
Playbook: self_heal_vm.yml
- name: Self-Healing Nutanix VM Automation
hosts: localhost
gather_facts: false
collections:
- nutanix.ncp
vars_files:
- nutanix_credentials.yml
vars:
monitored_vms:
- name: web01
expected_power: "on"
heal_action: "reboot"
- name: db01
expected_power: "on"
heal_action: "restore_snapshot"
tasks:
- name: Fetch VM states
loop: "{{ monitored_vms }}"
loop_control:
loop_var: vm
nutanix.ncp.vms_info:
name: "{{ vm.name }}"
register: vm_status
- name: Heal powered-off VMs
when: item.vms[0].power_state != vm.expected_power
loop: "{{ vm_status.results }}"
loop_control:
loop_var: item
block:
- name: Trigger VM reboot
when: item.item.heal_action == "reboot"
nutanix.ncp.vms:
name: "{{ item.item.name }}"
state: present
power_state: restart
cluster_name: "prod-cluster"
- name: Restore from snapshot (fallback)
when: item.item.heal_action == "restore_snapshot"
debug:
msg: "TODO: Restore snapshot for {{ item.item.name }} – Add logic here."
Optional Extensions
- Add NIC state detection
- Roll snapshot logic into
nutanix.ncp.vm_snapshots - Notify teams of triggered remediations
Scheduling It
*/10 * * * * ansible-playbook self_heal_vm.yml --ask-vault-pass -i inventory.yml
Runs every 10 minutes to validate and repair VM state.
Summary
This playbook gives your Nutanix cluster a resilience layer. Use it to automatically recover VMs from failures, reduce incident response time, and support 24×7 environments without manual oversight.
External Documentation:
Introduction Nutanix Prism Central and Prism Element updates are essential for security and performance. But manual patching risks inconsistencies, missed steps, and...