Table of Contents
- Executive Summary
- What Is an NSX-T Edge Cluster?
- NSX-T Edge Node Sizing and Hardware Requirements
- Edge Cluster Placement Strategies
- Logical and Physical Topologies
- Fault Domain Considerations
- Edge Cluster Layouts
- Automated Deployments with Ansible and Terraform
- Ansible Playbook Example
- Terraform Module Example
- NSX-T Edge Cluster Failover Automation
- PowerShell Scripts
- Workflow Logic
- Troubleshooting and Best Practices
- Conclusion
For more NSX-T Content: https://digitalthoughtdisruption.com/category/nsx-t
Executive Summary
Robust NSX-T edge clusters are the backbone of high-availability, high-performance software-defined networks. This blog covers everything from proper sizing and intelligent placement to modern, automated deployments and failover. All examples are for NSX 4.x on vSphere, diagrams, Ansible/Terraform code, and practical PowerShell for end-to-end automation. This guide is equally applicable for greenfield builds or brownfield upgrades.
What Is an NSX-T Edge Cluster?
An NSX-T edge cluster is a group of edge transport nodes that delivers north-south connectivity, centralized services (load balancing, NAT, VPN), and scalable routing. Edge clusters provide redundancy and high availability, distributing services across multiple nodes.
Core Components:
- Edge Node: A VM or bare metal appliance providing data/control plane functions.
- Edge Cluster: Logical grouping of edge nodes for redundancy and ECMP (Equal-Cost Multi-Path) routing.
- T0/T1 Gateway: Logical routers that consume edge resources for external connectivity.
NSX-T Edge Node Sizing and Hardware Requirements
Edge node size and number define your network’s throughput, scale, and fault tolerance. The right sizing aligns with your real traffic, service needs (NAT, LB, VPN), and business continuity requirements.
NSX-T 4.x Edge Node Sizing Table
| Edge Node Size | vCPU | RAM (GB) | Disk (GB) | Throughput (Gbps) | Use Case |
|---|---|---|---|---|---|
| Small | 4 | 8 | 120 | ~2 | Lab, Proof of Concept |
| Medium | 8 | 32 | 200 | ~10 | SMB, light prod |
| Large | 16 | 64 | 400 | ~24 | Enterprise, multiple services |
| Extra Large | 32 | 128 | 600 | ~40+ | Heavy prod, high throughput |
Tip: Always validate sizing with real traffic. Monitor using NSX-T metrics and Aria Operations.
Minimum and Recommended Cluster Size
- Minimum for production: 2 edge nodes (Active/Standby or ECMP)
- Best practice: 3 or more nodes (N+1 or N+2 redundancy)
Edge Cluster Placement Strategies
Logical and Physical Design
Strategic placement is essential for high availability. Your goal: make sure no single hardware or rack failure can disrupt all edge services.
Best Practices
- Distribute edge nodes across different ESXi hosts.
- Place edge nodes in different racks or fault domains.
- Use dedicated uplinks (VDS) for each edge node.
- Keep edge nodes isolated from general compute workloads.
Edge Cluster Logical Layout

Fault Domain Awareness
- Never place both edge nodes on the same ESXi host or in the same rack.
- Use DRS anti-affinity rules for 3+ node clusters.
Edge Cluster Physical Placement

Automated Deployments with Ansible and Terraform
Infrastructure-as-Code allows you to standardize and automate deployments.
Ansible Playbook Example
---
- name: Deploy NSX-T Edge Node
hosts: localhost
tasks:
- name: Create Edge Transport Node
uri:
url: "https://{{ nsx_manager }}/api/v1/transport-nodes"
method: POST
user: "{{ nsx_user }}"
password: "{{ nsx_pass }}"
force_basic_auth: yes
validate_certs: no
body: "{{ lookup('file','edge_node_payload.json') }}"
body_format: json
headers:
Content-Type: "application/json"
status_code: 201
Terraform Module Example
provider "nsxt" {
host = var.nsx_host
username = var.nsx_user
password = var.nsx_pass
allow_unverified_ssl = true
}
resource "nsxt_edge_cluster" "edge_cluster1" {
display_name = "Prod-Edge-Cluster"
edge_nodes = [nsxt_edge_node.edge1.id, nsxt_edge_node.edge2.id]
}
resource "nsxt_edge_node" "edge1" {
display_name = "Edge-Node-1"
# Additional configuration here
}
NSX-T Edge Cluster Failover Automation
Failover automation keeps your edge cluster resilient—no manual intervention needed during node or VM failures.
End-to-End Failover Workflow
- Detect edge node or VM failure
- Check NSX Edge services status
- Trigger VM restart or node replacement
- Validate north-south connectivity post-remediation
PowerShell: Edge Node Health Check and Auto-Restart
Import-Module VMware.PowerCLI
# Connect to vCenter
Connect-VIServer -Server 'vcenter.example.com' -User 'admin' -Password 'securepass'
# Get Edge Node VMs
$edgeVMs = Get-VM -Name "Edge-Node-*"
foreach ($vm in $edgeVMs) {
$status = Get-VM $vm | Select-Object PowerState
if ($status.PowerState -ne "PoweredOn") {
Write-Host "Restarting $($vm.Name)..."
Start-VM $vm -Confirm:$false
# Add custom notification/escalation here
}
}
# Validate NSX Edge Services post-restart
# (Invoke API call or check status in NSX Manager)
Automated Edge Replacement (Workflow Steps)
- Detect unrecoverable edge node
- Remove node from edge cluster
- Deploy a new edge node (via Ansible/Terraform)
- Rejoin to cluster, reattach services
- Confirm routing, NAT, load balancing
Troubleshooting and Best Practices
- Monitor edge node and cluster health (NSX Manager, Aria/vROps, API)
- Test failover regularly (simulate host or edge VM failure)
- Patch edge appliances to latest supported NSX version
- Use physical network redundancy: dual ToR, multiple uplinks
- Enforce vSphere DRS anti-affinity for all edge nodes
Conclusion
NSX-T edge clusters are the core of scalable, resilient network designs. Sizing, placement, and automation are all critical for uptime and performance. Use YAML and code-driven deployments to reduce manual errors, and always validate your design with real data. With diagrams, you can accelerate design reviews and ops handoff—making your next upgrade or greenfield project a breeze.
Disclaimer: The views expressed in this article are those of the author and do not represent the opinions of VMwware, my employer or any affiliated organization. Always refer to the official VMWare documentation before production deployment.