Site icon Digital Thought Disruption

VCF 9.0 GA Mental Model Part 1: Fleets, Instances, Domains, and the Fleet Management Layer

TL;DR

If you want alignment fast, standardize on this hierarchy and ownership split:

Architecture Diagram

Table of Contents

Scenario

You need architects, operations, and leadership to answer the same questions the same way:

If you do not standardize vocabulary, you end up with:

Series map

This topic is usually too big for one post, so here is the split:

Assumptions

Core vocabulary your org should standardize

Use these terms consistently. Treat anything else as “slang” that must map back here.

TermWhat it means in your operating modelWhat it is not
Organizational private cloudYour program label for “the service you provide” (budget, roadmap, stakeholders)Not a VCF object you create in the UI
VCF fleetGovernance and shared-services boundary for fleet management componentsNot “one shared vCenter”
Fleet management componentsShared services for operations, lifecycle, automation, identityNot the same thing as SDDC Manager
VCF instanceA discrete VCF deployment footprint (management domain + workload domains)Not a tenant boundary by default
VCF domainLifecycle and isolation boundary (management domain or workload domain)Not always a “site”
Management domainA domain that hosts the instance’s core management components (and fleet components for the first instance)Not “never runs workloads” (you should still avoid putting random workloads here)
Workload domainA domain built to run consumer workloadsNot a catch-all dumping ground for everything
vSphere clusterThe scale unit inside a domainNot the lifecycle boundary for VCF components

This is the phrasing that reduces ambiguity in meetings:

The hierarchy that prevents org-wide confusion

You are building a private cloud, but you manage fleets, instances, and domains

Leadership will say “private cloud.” That is fine.

Your platform team must translate that into platform objects:

If you do not do this translation explicitly, you will end up with “one private cloud” being interpreted as “one vCenter,” which is the wrong mental model in VCF 9.0.

The hierarchy you should teach

This is the hierarchy you use in architecture reviews, operational runbooks, incident triage, and change calendars.

What the fleet actually owns vs what stays per instance

This is the most important correction to get right.

Fleet scope

The fleet gives you shared services across multiple instances, typically including:

Operational implication:

Instance scope

Each instance remains a discrete SDDC management footprint:

Operational implication:

Domain scope

Domains are where you place:

In VCF 9.0, a workload domain commonly has:

Operational implication:

Day-0, day-1, day-2 map

This is the “where do we do the thing” map you want in every runbook.

Day-0 design decisions

These choices are difficult or expensive to reverse later:

Day-0 outcome you want:

Day-1 bring-up

Day-1 is about standing up the first production-quality slice:

Day-1 outcome you want:

Day-2 operations

Day-2 is where most pain lives if the model is wrong:

Day-2 outcome you want:

Challenge: one private cloud vs multiple fleets

The challenge

You want “one private cloud” from a service catalog perspective, but you also need:

In VCF terms, that means deciding how many fleets you run.

Solutions

Solution A: One private cloud program, one fleet

Best when:

Tradeoffs:

Solution B: One private cloud program, multiple fleets

Best when:

Tradeoffs:

Solution C: Multiple private cloud programs, multiple fleets

Best when:

Tradeoffs:

Architecture tradeoff matrix

DecisionIsolationScaleOps overheadUpgrade coordinationTypical use case
One fleetMediumHighLowestHighest coupling at fleet layerSingle enterprise platform
Multiple fleetsHighHighMedium to highIndependent per fleetRegulated isolation, large enterprises
Multiple programsVery highVariesHighestFully independentSeparate business entities

Deployment posture patterns

Single site

A practical default:

Operational posture:

Two sites in one region

Your main design fork is: “Do I want a single operational footprint or two?”

Common patterns:

Operational posture:

Multi-region

You are balancing:

Common patterns:

Operational posture:

Who owns what

This is the “stop paging the wrong team” table.

ScopePlatform team (cloud foundation)VI admin (infrastructure)App/platform teams (consumers)
Fleet services (VCF Operations, fleet management, Automation, Identity Broker)Own and operate. Standards, backups, certs, lifecycle, RBACSupport integration points for their instancesConsume. No admin of fleet services
Fleet topology (how many fleets, where)Accountable decisionConsultedInformed
Instance management stack (SDDC Manager, mgmt vCenter, mgmt NSX)Guardrails and standardsOwn and operate day-2Not responsible
Domain lifecycle (create, patch, expand, decommission)Guardrails and patternsExecute and operateConsume outcomes
Workload operations inside guest OS and KubernetesNot responsibleNot responsibleOwn and operate

Design-time vs day-2 ownership

Failure domain analysis

Use this to set expectations with leadership.

Fleet services failure

What breaks:

What usually keeps running:

Instance management domain failure

What breaks:

What might keep running:

Workload domain failure

What breaks:

What keeps running:

Anti-patterns

These are the patterns that create long-term operational pain.

Summary and Takeaways

If you want the org aligned, lock in these rules:

Conclusion

VCF 9.0 becomes easier to operate when you stop thinking “one vCenter” and start thinking in fleet, instance, and domain boundaries.

When you do that, you get:

Exit mobile version