Site icon Digital Thought Disruption

VCF 9.0 GA Mental Model Part 3: Day-0 to Day-2 Ownership Across Fleets, Instances, and Domains

TL;DR

If you want clean accountability in VCF 9.0, anchor your operating model to the official hierarchy:

VCF private cloud -> VCF fleet -> VCF instance -> VCF domain -> vSphere clusters.

This post translates that hierarchy into an operating model: who owns what, where day-0/day-1/day-2 work happens, and how topology (single site vs two sites vs multi-region) changes your posture.

Scope and code levels referenced in this article (VCF 9.0 GA component set):

Architecture Diagram

Legend:

Table of Contents

Scenario

You are aligning architects, operations, and leadership on what VCF 9.0 is actually managing, and how responsibilities split across:

The goal is predictable ownership and predictable blast radius.

Assumptions

Scope and Code Levels

This article is pinned to the VCF 9.0 GA component set (versions and builds listed in TL;DR). If you are on a later 9.0.x maintenance release, terminology remains consistent, but exact UI placement and lifecycle sequencing can shift.

Version Compatibility Matrix

LayerComponentVersionBuildWhy you care operationally
DeploymentVCF Installer9.0.1.024962180The day-1 bring-up entrypoint for fleets and instances
Instance foundationSDDC Manager9.0.0.024703748Drives instance lifecycle workflows and inventory control
Domain foundationvCenter9.0.0.024755230Domain-level management boundary and API surface
Host layerESX9.0.0.024755229Your cluster capacity and patching blast radius unit
Network layerNSX9.0.0.024733065Network segmentation, security policy, and edge services
Fleet servicesVCF Operations9.0.0.024695812Central ops, visibility, grouping, and platform workflows
Fleet servicesVCF Operations fleet management9.0.0.024695816Lifecycle for fleet services and related management components
Fleet servicesVCF Automation9.0.0.024701403Self-service, governance, and policy-driven provisioning
IdentityVCF Identity Broker9.0.0.024695128Enables VCF Single Sign-On models and SSO scope decisions

The Ownership Model You Actually Need

A clean VCF program usually stabilizes when you stop assigning ownership by product name and start assigning it by boundary:

“Who owns what” chart

Use this as a starting point for your internal RACI.

Construct or capabilityPrimary ownerSecondary ownerDay-2 responsibilities that must be explicit
VCF private cloud (org boundary)Platform teamSecurity/GRCPortfolio decisions, fleet count, policy and compliance guardrails
VCF fleetPlatform teamArchitectureFleet service lifecycle, shared governance, change windows, identity posture
Fleet-level management components (VCF Operations, VCF Operations fleet management, VCF Automation)Platform teamSRE/OperationsBackups, upgrades, integrations, tenant and RBAC guardrails
VCF instancePlatform teamRegional opsCapacity lifecycle, adding domains, instance-level networking standards
Management domainPlatform teamVI admin“Keep the platform running” discipline: patching, certificates, backups
VI workload domainVI adminPlatform teamDay-2 LCM inside guardrails, cluster operations, domain health
Domain networking (NSX segments, T0/T1 patterns, edge capacity)Platform teamNetwork/securityNetwork design standards, firewall policy model, edge scaling ceilings
VM provisioning and templatesApp/platform teamsVI adminGolden image ownership, config drift control, tagging standards
Kubernetes platform on vSphereApp/platform teamsPlatform teamNamespace policy, cluster lifecycle, RBAC, platform SLOs
VCF Automation catalogs, projects, policiesPlatform teamApp/platform teamsSelf-service guardrails, approvals, quotas, blueprint governance
FinOps reporting and showbackPlatform teamFinanceTagging accuracy, allocation rules, cost anomaly response

Design-time vs day-2 operations

This split is where most teams get surprised.

Design-time decisions (day-0) are expensive to unwind:

Day-2 operations should be routine, repeatable, and low-toil:

Day-0, Day-1, Day-2 Map

Use this map to stop “platform work” from leaking into “workload work”, and vice versa.

PhaseWhat you doWhere it happensWhy it matters
Day-0Decide VCF private cloud -> fleets -> instances -> domains topologyArchitecture/designThis locks your governance and blast radius posture
Day-0Choose identity model and SSO scopeArchitecture/securityIdentity boundaries are hard to change later without operational pain
Day-0Define network consumption model and tenant isolation modelPlatform + network/securityNetwork decisions dictate scale ceilings and operational toil
Day-1Deploy first fleet + first instance management domainVCF Installer + first instance management domainThe first instance becomes the anchor location for fleet services
Day-1Stand up fleet-level management componentsFleet services (hosted in first instance management domain)This is your “platform services layer” for operations and governance
Day-1Deploy initial VI workload domain(s)Instance lifecycle workflowsWorkload domains become your default lifecycle and isolation unit
Day-2Add instances (new sites or regions)Fleet services + new instance management domainExpands footprint while keeping governance centralized
Day-2Add workload domains and clustersInstance workflows + domain operationsExpands capacity and isolates workloads cleanly
Day-2Operate identity, automation, and lifecycleFleet servicesCentralizes day-2 governance across attached instances

Decision Criteria: Fleet vs Instance vs Domain vs Cluster

Most “VCF design debates” are actually “where do I want the blast radius to stop?”

Quick decision table

If you need…Add a fleetAdd an instanceAdd a domainAdd a cluster
Separate governance plane and change windowsYesNoNoNo
Regulated isolation with hard separationOften yesSometimesSometimesNo
New site or region footprintSometimesYesNoNo
More lifecycle isolation for workloadsNoNoYesSometimes
Different SLA or patch cadence for a workload groupNoNoYesSometimes
More capacity in same workload boundaryNoNoNoYes
Separate SSO boundaryYes (cleanest)SometimesNoNo
Reduced shared service blast radiusYesSometimesSometimesNo

Architecture Tradeoff Matrix

OptionGovernance isolationOperational overheadScale ceilingTypical use
One private cloud, one fleetLowestLowestMedium to high, depending on identity modelStandard enterprise starting point
One private cloud, multiple fleetsHighHigherHigher overall, but duplicated servicesRegulated zones, different change windows
Multiple private cloudsHighestHighestHighestMergers, hard org separation, distinct GRC boundaries

Topology Posture

You can support all three topologies with the same mental model. What changes is how you set your fleet and instance boundaries.

Single site

This is the simplest operating posture:

Operational posture:

Two sites in one region

Challenge:

You want higher availability and operational continuity, but you do not want to turn every incident into a “distributed systems lesson”.

Solutions:

A) One fleet, one instance, stretched where justified

B) One fleet, two instances (one per site)

C) Two fleets (one per site)

Multi-region

Challenge:

Regions are real failure domains. Latency and inter-region dependency will punish “single control plane” assumptions.

Solutions:

A) One private cloud, one fleet, multiple instances (region aligned)

B) One private cloud, multiple fleets (region aligned or regulation aligned)

C) Multiple private clouds

Identity Boundaries and SSO Scope

In VCF 9.0, identity is not a footnote. It is a design-time decision that changes:

VCF Single Sign-On models you should reason about

Treat these as scope control knobs.

ModelSSO scopeAvailability postureOperational overheadWhen it fits
Fleet-wide SSOLargeLower (single identity service per fleet)LowOne fleet, tight governance, smaller instance count
Cross-instance SSOBalancedBalancedMediumLarger fleets, want to limit identity blast radius
Single-instance SSOSmall (per instance)Higher per instanceHigherRegulated isolation or region autonomy

Scale note you should plan for:

Separate IdP and separate SSO boundaries (do both)

You typically implement “separate identity boundaries” two ways.

A) Separate fleets, separate IdPs

B) One fleet, multiple identity brokers (cross-instance model)

Design-time warning:

Failure Domain Analysis

This is the mental model that reduces panic during incidents.

Practical blast radius map

FailureWhat breaks firstWhat usually keeps runningYour first triage question
Fleet services outage (VCF Operations, VCF Automation)Visibility, governance workflows, self-service provisioning, central policy operationsExisting workloads in domains, core hypervisor operationsIs this governance down or is core infrastructure down?
Identity broker outage (in-scope instances)Logins and SSO flows for in-scope componentsExisting workloads and dataplane continueWhat is the SSO scope for this identity broker?
Instance management domain incidentInstance lifecycle workflows, management vCenter/NSX for that instanceWorkloads can keep running, but operations become constrainedCan you still reach workload domain vCenter/NSX?
Workload domain incidentDomain-specific provisioning and lifecycleOther domains and instancesIs isolation working the way you intended?
Cluster-level capacity failurePlacement, HA behavior, performanceOther clusters/domainsDid you design cluster boundaries around maintenance and failure?

Operational Runbook Snapshot

This is the minimum you want documented before you call the platform “ready”.

Fleet services runbook (platform team)

Real-world RTO/RPO examples you can start with

These are starting targets that many teams use to set expectations. Tune them to your recovery strategy and staffing model.

The key is consistency: define targets per boundary and test them.

Anti-patterns

These are the patterns that inflate toil and create “mystery outages”.

Troubleshooting workflow

When something breaks, the fastest teams classify the problem by boundary first.

Conclusion

If you want VCF 9.0 to feel operable at scale, you need an ownership model that matches the platform hierarchy:

When those boundaries map cleanly to “who owns what”, day-2 operations becomes repeatable instead of heroic.

Sources

VMware Cloud Foundation 9.0 Documentation (Tech Docs): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0.html
VMware Cloud Foundation 9.0 Release Notes: https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/vmware-cloud-foundation-90-release-notes.html
Design (VMware Cloud Foundation 9.0): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/design.html
Architectural Options in VMware Cloud Foundation: https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/design/vmware-cloud-foundation-concepts.html
Fleet Management (VCF Operations): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/overview-of-vmware-cloud-foundation-9/what-is-vmware-cloud-foundation-and-vmware-vsphere-foundation/vcf-operations-overview/fleet-management.html
VCF Single Sign-On Architecture: https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/fleet-management/what-is/sso-architecture.html
Identity Providers and Protocols Supported for VCF Single Sign-On: https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/fleet-management/what-is/protocols-suported-for–sso.html
Linking vCenter instances in VCF Operations: https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/fleet-management/linking-vcenter-systems-in-vmware-cloud-foundation-operations.html

Exit mobile version