NSX-T Edge Clusters: Sizing, Placement, and Failover Automation

Paul Bryant

9 months ago

Executive Summary
What Is an NSX-T Edge Cluster?
NSX-T Edge Node Sizing and Hardware Requirements
Edge Cluster Placement Strategies
- Logical and Physical Topologies
- Fault Domain Considerations
- Edge Cluster Layouts
Automated Deployments with Ansible and Terraform
- Ansible Playbook Example
- Terraform Module Example
NSX-T Edge Cluster Failover Automation
- PowerShell Scripts
- Workflow Logic
Troubleshooting and Best Practices
Conclusion

For more NSX-T Content: https://digitalthoughtdisruption.com/category/nsx-t

Executive Summary

Robust NSX-T edge clusters are the backbone of high-availability, high-performance software-defined networks. This blog covers everything from proper sizing and intelligent placement to modern, automated deployments and failover. All examples are for NSX 4.x on vSphere, diagrams, Ansible/Terraform code, and practical PowerShell for end-to-end automation. This guide is equally applicable for greenfield builds or brownfield upgrades.

What Is an NSX-T Edge Cluster?

An NSX-T edge cluster is a group of edge transport nodes that delivers north-south connectivity, centralized services (load balancing, NAT, VPN), and scalable routing. Edge clusters provide redundancy and high availability, distributing services across multiple nodes.

Core Components:

Edge Node: A VM or bare metal appliance providing data/control plane functions.
Edge Cluster: Logical grouping of edge nodes for redundancy and ECMP (Equal-Cost Multi-Path) routing.
T0/T1 Gateway: Logical routers that consume edge resources for external connectivity.

NSX-T Edge Node Sizing and Hardware Requirements

Edge node size and number define your network’s throughput, scale, and fault tolerance. The right sizing aligns with your real traffic, service needs (NAT, LB, VPN), and business continuity requirements.

NSX-T 4.x Edge Node Sizing Table

Edge Node Size	vCPU	RAM (GB)	Disk (GB)	Throughput (Gbps)	Use Case
Small	4	8	120	~2	Lab, Proof of Concept
Medium	8	32	200	~10	SMB, light prod
Large	16	64	400	~24	Enterprise, multiple services
Extra Large	32	128	600	~40+	Heavy prod, high throughput

Tip: Always validate sizing with real traffic. Monitor using NSX-T metrics and Aria Operations.

Minimum and Recommended Cluster Size

Minimum for production: 2 edge nodes (Active/Standby or ECMP)
Best practice: 3 or more nodes (N+1 or N+2 redundancy)

Edge Cluster Placement Strategies

Logical and Physical Design

Strategic placement is essential for high availability. Your goal: make sure no single hardware or rack failure can disrupt all edge services.

Best Practices

Distribute edge nodes across different ESXi hosts.
Place edge nodes in different racks or fault domains.
Use dedicated uplinks (VDS) for each edge node.
Keep edge nodes isolated from general compute workloads.

Edge Cluster Logical Layout

Fault Domain Awareness

Never place both edge nodes on the same ESXi host or in the same rack.
Use DRS anti-affinity rules for 3+ node clusters.

Edge Cluster Physical Placement

Automated Deployments with Ansible and Terraform

Infrastructure-as-Code allows you to standardize and automate deployments.

Ansible Playbook Example

---
- name: Deploy NSX-T Edge Node
  hosts: localhost
  tasks:
    - name: Create Edge Transport Node
      uri:
        url: "https://{{ nsx_manager }}/api/v1/transport-nodes"
        method: POST
        user: "{{ nsx_user }}"
        password: "{{ nsx_pass }}"
        force_basic_auth: yes
        validate_certs: no
        body: "{{ lookup('file','edge_node_payload.json') }}"
        body_format: json
        headers:
          Content-Type: "application/json"
        status_code: 201

Terraform Module Example

provider "nsxt" {
  host                  = var.nsx_host
  username              = var.nsx_user
  password              = var.nsx_pass
  allow_unverified_ssl  = true
}

resource "nsxt_edge_cluster" "edge_cluster1" {
  display_name = "Prod-Edge-Cluster"
  edge_nodes   = [nsxt_edge_node.edge1.id, nsxt_edge_node.edge2.id]
}

resource "nsxt_edge_node" "edge1" {
  display_name = "Edge-Node-1"
  # Additional configuration here
}

NSX-T Edge Cluster Failover Automation

Failover automation keeps your edge cluster resilient—no manual intervention needed during node or VM failures.

End-to-End Failover Workflow

Detect edge node or VM failure
Check NSX Edge services status
Trigger VM restart or node replacement
Validate north-south connectivity post-remediation

PowerShell: Edge Node Health Check and Auto-Restart

Import-Module VMware.PowerCLI

# Connect to vCenter
Connect-VIServer -Server 'vcenter.example.com' -User 'admin' -Password 'securepass'

# Get Edge Node VMs
$edgeVMs = Get-VM -Name "Edge-Node-*"
foreach ($vm in $edgeVMs) {
    $status = Get-VM $vm | Select-Object PowerState
    if ($status.PowerState -ne "PoweredOn") {
        Write-Host "Restarting $($vm.Name)..."
        Start-VM $vm -Confirm:$false
        # Add custom notification/escalation here
    }
}

# Validate NSX Edge Services post-restart
# (Invoke API call or check status in NSX Manager)

Automated Edge Replacement (Workflow Steps)

Detect unrecoverable edge node
Remove node from edge cluster
Deploy a new edge node (via Ansible/Terraform)
Rejoin to cluster, reattach services
Confirm routing, NAT, load balancing

Troubleshooting and Best Practices

Monitor edge node and cluster health (NSX Manager, Aria/vROps, API)
Test failover regularly (simulate host or edge VM failure)
Patch edge appliances to latest supported NSX version
Use physical network redundancy: dual ToR, multiple uplinks
Enforce vSphere DRS anti-affinity for all edge nodes

Conclusion

NSX-T edge clusters are the core of scalable, resilient network designs. Sizing, placement, and automation are all critical for uptime and performance. Use YAML and code-driven deployments to reduce manual errors, and always validate your design with real data. With diagrams, you can accelerate design reviews and ops handoff—making your next upgrade or greenfield project a breeze.

Disclaimer: The views expressed in this article are those of the author and do not represent the opinions of VMwware, my employer or any affiliated organization. Always refer to the official VMWare documentation before production deployment.

NSX-T East-West vs. North-South Traffic: Architecture, Design, and Troubleshooting

Table of Contents Overview East-West vs. North-South: Key Traffic Patterns NSX-T 4.x Architecture Explained Traffic Flow Diagrams Real-World Production Use Cases Traffic Path Deep Dives East-West Flow North-South Flow Best…

Table of Contents