Table of Contents
- Introduction to Nutanix DR
- Core Concepts and Terminology
- Nutanix DR Solution Portfolio
- Nutanix Leap
- NearSync
- Metro Availability
- Native Snapshots and Replication
- Architecture Overview
- Pre-Requisites and Planning
- Deployment Models: On-Prem, Hybrid, Multi-Cloud
- Configuring Nutanix Leap
- NearSync: Sub-Minute RPO Protection
- Metro Availability for Zero RPO
- Failover, Failback, and DR Testing Workflows
- Compliance, Reporting, and Monitoring
- Advanced CLI/API Automation
- Best Practices and Pro Tips
- Common Use Cases
1. Introduction to Nutanix DR
Disaster recovery ensures that applications and data remain available, even after catastrophic events. Nutanix delivers integrated DR features across all deployment models, minimizing recovery time objectives (RTOs) and recovery point objectives (RPOs).
Nutanix DR is designed to be hypervisor-agnostic but delivers the richest integration with AHV. It enables rapid, policy-driven failover, automation, and seamless orchestration.
2. Core Concepts and Terminology
| Term | Description |
|---|---|
| RPO | Recovery Point Objective: How much data loss is acceptable |
| RTO | Recovery Time Objective: How quickly workloads must be recovered |
| DR Runbook | Pre-defined sequence of failover steps |
| Metro Availability | Synchronous, zero RPO replication across sites |
| NearSync | Sub-minute, asynchronous replication for critical workloads |
| Nutanix Leap | SaaS-based DR orchestration and runbook automation |
| Consistency Group | Group of VMs/data to be replicated as a single unit |
3. Nutanix DR Solution Portfolio
Nutanix offers a range of DR features, all managed through Prism Central and Leap.
Nutanix Leap
- SaaS-based DR orchestration.
- Policy-driven protection plans and runbooks.
- Supports AHV, ESXi (with limited features), and integrates with third-party clouds.
NearSync
- Near-real-time, sub-minute replication.
- Lightweight, bandwidth-efficient, no need for shared storage.
- Suitable for mission-critical apps.
Metro Availability
- Synchronous replication across two sites.
- Enables zero RPO and seamless VM mobility.
- Requires low-latency links.
Native Snapshots and Replication
- Local and remote snapshots.
- Flexible, space-efficient backups.
4. Architecture Overview
Nutanix DR leverages a combination of local clusters, remote DR clusters, and a SaaS control plane (Leap).

- Prism Central: Centralized management and policy control.
- Leap: Cloud-based DR runbook and workflow automation.
- Clusters: Can be on-premises, remote, or cloud (e.g., Nutanix Clusters on AWS).
5. Pre-Requisites and Planning
- Licensing: Ensure Leap, NearSync, Metro Availability, or other required features are licensed.
- Network: Sufficient bandwidth and low latency for synchronous or near-sync replication.
- Cluster Pairing: Establish trust between primary and DR clusters.
- DNS and Authentication: Configure networking for failover scenarios.
- Compliance: Map DR objectives to regulatory or business requirements.
6. Deployment Models: On-Prem, Hybrid, Multi-Cloud
Nutanix DR supports a variety of architectures:
- On-Prem to On-Prem: Traditional two-site DR, including metro regions.
- On-Prem to Cloud: Use Nutanix Clusters on AWS/Azure as DR targets.
- Multi-Cloud: Orchestrate DR across multiple cloud providers or sites.
- Hybrid: Mix on-prem and public cloud resources.
Diagram: DR Topologies

7. Configuring Nutanix Leap
Leap offers policy-based orchestration for DR. Below is a typical setup flow.
Step 1: Access Leap
- Log in to Prism Central.
- Navigate to Data Protection & DR > Leap.
Step 2: Register Sites
- Pair your primary and DR clusters.
- Verify AHV cluster connectivity.
Step 3: Create Protection Plans
- Define which VMs/groups to protect.
- Set RPO, retention, and schedule.
Step 4: Author Runbooks
- Use Leap’s visual designer to build custom failover/failback workflows.
- Add automation steps for network re-IP, DNS, or application startup.
Sample CLI to Query DR Plans:
ncli protection-domain list
ncli protection-domain.get name=<ProtectionDomain>
8. NearSync: Sub-Minute RPO Protection
NearSync allows you to protect critical workloads with minimal data loss.
Configuration Steps:
- Enable NearSync on both clusters.
- Select VMs/consistency groups for NearSync protection.
- Set schedule (default: every 20 seconds).
CLI Example:
ncli protection-domain.create name=Finance-NS type=NearSync
ncli pd-schedule.create pd-name=Finance-NS schedule-type=every_x_minute
9. Metro Availability for Zero RPO
Metro Availability is ideal for environments needing zero data loss and active-active clusters.
Requirements:
- Low-latency, high-bandwidth link (≤5 ms RTT recommended).
- Identical AHV versions across clusters.
Enabling Metro Availability:
- In Prism Central, go to Data Protection > Metro Availability.
- Pair clusters and designate Metro Availability-enabled storage containers.
- Enable VM affinity rules for site failover.
CLI Snippet:
ncli container edit name=<ContainerName> enable-metro-availability=true
10. Failover, Failback, and DR Testing Workflows
Failover Workflow Table
| Step | Task | Command/API/Portal |
|---|---|---|
| 1 | Initiate Failover | Prism/Leap or CLI |
| 2 | Automate network re-IP | Runbook/Script |
| 3 | Power on protected VMs | Leap/CLI/API |
| 4 | Validate app/data | Manual/test automation |
| 5 | Confirm with stakeholders | Email/portal notification |
Sample Failover Command (CLI):
ncli pd-failover start name=<ProtectionDomain> remote-site=<DRSite>
Testing DR (Non-Disruptive):
- Use Leap’s “Test Failover” to clone protected VMs to an isolated network.
- Validate DR runbook steps without impacting production.
11. Compliance, Reporting, and Monitoring
- Automated Reporting: Leap generates compliance and DR reports for audits.
- SIEM Integration: Export DR events/logs for external analysis (Splunk, QRadar).
- Alerting: Configure alerts for failed replications or missed RPOs.
- Audit Logs: All DR actions are logged and timestamped for compliance review.
API Example for Reporting:
GET /leap/api/v1/reports
Authorization: Bearer <token>
12. Advanced CLI/API Automation
Nutanix exposes robust APIs for automating DR.
Example: Create DR Plan via API
curl -k -X POST "https://<prism_central>:9440/leap/api/v1/plans" \
-H "Content-Type: application/json" \
-d '{
"name": "Critical-DR-Plan",
"protected_vms": ["VM1", "VM2"],
"recovery_point_objective": 60,
"runbook_steps": ["network", "poweron", "validation"]
}'
Bulk Failover Script (Python)
import requests
def trigger_failover(plan_id, token):
url = f"https://<prism_central>:9440/leap/api/v1/failover/{plan_id}"
headers = {'Authorization': f'Bearer {token}'}
r = requests.post(url, headers=headers)
return r.status_code, r.json()
13. Best Practices and Pro Tips
- Test Regularly: Schedule DR tests quarterly. Automate where possible.
- Document Everything: Keep runbooks and DR plans version-controlled.
- Automate Notifications: Integrate Leap with Slack, Teams, or email for instant alerts.
- Bandwidth Planning: Monitor WAN usage and scale as data grows.
- Least Privilege: Limit DR admin roles to security teams only.
14. Common Use Cases
- Ransomware Recovery: Restore to a clean DR site if primary is compromised.
- Cloud Migration: Use DR failover to migrate workloads between on-prem and cloud.
- Regulatory Compliance: DR plans mapped to SOX, HIPAA, GDPR, etc.
- Active-Active Applications: Zero RPO for Tier-1 business services.
- Branch Office DR: Centralize recovery for remote locations.
15. Diagrams and Workflow Tables
A. Basic DR Replication Topology

B. Failover/Failback Workflow Table
| Stage | Action | Tools/Scripts |
|---|---|---|
| Failover | Initiate runbook | Leap, CLI, API |
| Automate re-IP/DNS updates | Scripted in Leap | |
| Validate app startup | Manual/automated | |
| Failback | Resync changes | Replication |
| Restore original state | Runbook step |
Conclusion
Nutanix Disaster Recovery offers a flexible and powerful approach to safeguarding enterprise workloads across on-premises, hybrid, and multi-cloud environments. By combining advanced features like Leap for orchestration, NearSync for near-zero data loss, and Metro Availability for synchronous protection, Nutanix empowers IT teams to meet strict RTO and RPO requirements while streamlining recovery operations.
With native support for AHV, intuitive workflows, and deep automation capabilities through CLI and API, Nutanix DR solutions reduce complexity and operational risk. Organizations can confidently protect mission-critical applications, achieve regulatory compliance, and support business continuity with minimal manual intervention.
As threats continue to evolve, the ability to regularly test, automate, and adapt DR plans becomes even more critical. Nutanix delivers a unified platform that not only protects data but also accelerates recovery, keeping your business resilient and responsive in the face of disruption.
For IT administrators and architects, embracing Nutanix’s DR portfolio means less downtime, greater agility, and peace of mind—no matter where your workloads reside.
Disclaimer: The views expressed in this article are those of the author and do not represent the opinions of Nutanix, my employer or any affiliated organization. Always refer to the official Nutanix documentation before production deployment.
Table of Contents 1. Introduction Disaster recovery (DR) and business continuity planning are critical for enterprises relying on cloud-native infrastructure. With increasingly...