vVol migrations are easy to misread.
When a VM migration fails, the first instinct is usually to look at host load, vMotion networking, datastore latency, DRS behavior, or the backend array. Those checks still matter, but vVols introduce another dependency that can become the bottleneck before the data path is the real problem: the VASA provider control plane.
Broadcom KB318662 documents a specific failure pattern where initiating many vMotion operations from a single host, either manually or through maintenance mode, can result in some vMotions failing with a generic system error. The same KB points administrators to vvold.log, where long swap vVol creation times and exhausted VASA Provider connections may appear.
That distinction matters operationally. A vVol datastore is not just another datastore with a different label. vSphere uses VASA for out-of-band control operations between vCenter Server, ESXi hosts, and the storage system, while the actual I/O path continues over supported storage protocols through Protocol Endpoints.
This runbook is about diagnosing that control-plane pressure before you chase the wrong layer.
Scenario
You are migrating, evacuating, or placing a host into maintenance mode in a vSphere or VCF environment where virtual machines reside on vVol datastores. A group of migrations begins successfully, but then some tasks fail with a generic error such as “A system error occurred.”
The common operational pattern looks like this:
- Multiple vMotion operations start from the same source host.
- The affected VMs are stored on vVol datastores.
- The failure is inconsistent: some migrations complete, others fail.
- Retrying everything immediately may make the problem worse.
vvold.logshows VASA operations waiting, retrying, or failing because the provider has no free connections.
KB 318662 explicitly calls out migration failures when many vMotion operations are initiated from a single host and notes that vvold.log is located at /var/run/log/vvold.log.
This is not necessarily a “bad datastore” issue.
It may be a sign that the vVol control plane is under pressure.
Why This Matters Operationally
vVols change the storage management model. Instead of managing VM storage primarily through LUNs or NFS exports, vVols allow the VM and its disks to become the unit of storage management. vSphere uses Storage Policy-Based Management and VASA integration to align VM storage requirements with array capabilities.
That gives you powerful VM-granular storage behavior, but it also creates a dependency chain:
- vCenter and ESXi must communicate with the VASA provider.
- The VASA provider must translate vSphere requests into array-side operations.
- The array must respond quickly enough to control-plane operations such as create, bind, unbind, snapshot, clone, policy, or metadata actions.
- Protocol Endpoints must remain accessible for the data path.
vVols can support vMotion, Storage vMotion, snapshots, linked clones, and DRS, but those capabilities depend on a healthy vVol architecture, not just raw array performance.
The important mental model is this:
A VM can fail to migrate because the control plane cannot keep up, even when the underlying storage fabric appears healthy.
Symptoms and Risk Signals
The visible symptom in vCenter may not be very helpful. KB 318662 describes the user-facing task failure as a generic “A system error occurred.” The more useful evidence is usually on the ESXi host in vvold.log, where the VASA operation history can show long swap vVol creation or provider connection exhaustion.
Look for these signals:
| Signal | Where to Look | What It Usually Means |
|---|---|---|
| Generic migration task failure | vCenter Tasks / Events | vSphere knows the workflow failed, but not enough to identify the layer from the UI alone |
Long createVirtualVolume time | /var/run/log/vvold.log | vVol control-plane operation is slow |
No free connections to VP | /var/run/log/vvold.log | VASA provider connection pool is exhausted |
PROVIDER_BUSY | /var/run/log/vvold.log | Provider is reporting or behaving as busy under the operation load |
bindVirtualVolume retry or timeout | /var/run/log/vvold.log | ESXi is waiting on VASA bind activity |
VASA provider Offline or syncError | vCenter Storage Providers or ESXi CLI | Provider registration, trust, certificate, or availability issue |
| Protocol Endpoint inaccessible | Host Protocol Endpoints or ESXi CLI | Data path mapping/presentation problem, not merely provider saturation |
| VASA certificate alert | VCF Operations | Certificate lifecycle risk that can break vCenter-to-provider communication |
Do not assume all of these point to the same root cause. Provider saturation, provider registration failure, certificate trust problems, and Protocol Endpoint presentation issues can all surface as vVol migration or accessibility problems.
vVol Control Plane at a Glance
The diagram below shows the dependency that matters during vVol migrations. The migration task is initiated in vCenter, executed by ESXi, and dependent on both VASA control-plane calls and Protocol Endpoint data-path access. The key point is that the VASA provider can become the limiting component even when the storage network and array I/O path are still functional.

What to notice: VASA is not the bulk data path, but it is required for important vVol lifecycle and binding operations. If those control-plane calls queue, timeout, or exhaust provider connections during a migration storm, the migration can fail before the issue looks like traditional datastore latency.
Prerequisites and Safety Checks
Before you start changing anything, establish the operating boundary.
This runbook assumes a vSphere 7.x or 8.x environment, or a VCF 5.x-style operating model, where VMs are actively using vVol datastores backed by a storage partner’s VASA provider. KB 318662 lists ESXi 6.x, 7.x, and 8.x in scope for the issue.
Also note the version-sensitive roadmap context: Broadcom states that vVols are deprecated beginning with VCF 9.0 and VMware vSphere Foundation 9.0, and that vVols are fully discontinued in VCF/VVF 9.3.0. Broadcom also states that support for vVols, limited to critical bug fixes, continues for vSphere 8.x, VCF/VVF 5.x, and other older supported versions until those releases reach end of support.
Before remediation:
- Capture affected VM names, source host, destination host, datastore, task start time, and task error.
- Identify whether failures correlate to one source host, one vVol datastore, one VASA provider, or one storage array.
- Pause nonessential migration waves, especially automated maintenance-mode evacuation.
- Avoid unregistering or re-registering the VASA provider until you know whether the issue is saturation, certificate trust, registration, or Protocol Endpoint access.
- Confirm whether powered-on workloads are still serving I/O. Some vVol management-plane failures can affect provisioning or power-on operations while already powered-on VMs continue to run.
Broadcom documents cases where vVol datastores can appear inaccessible, powered-off VMs may fail to power on or become invalid, and already powered-on VMs may remain functional after VASA/provider-related management metadata issues.
Runbook Stage 1: Confirm the Failure Pattern
Start with the migration pattern, not the storage array.
Ask:
- Did this start when a host entered maintenance mode?
- Were many VMs migrated from the same source host at the same time?
- Are the failed VMs on vVol-backed storage?
- Are failures intermittent rather than universal?
- Do non-vVol migrations from the same host behave differently?
- Did the failure appear after provider upgrades, vCenter changes, certificate changes, or array maintenance?
If the failure started during a host drain, reduce the migration wave before doing deeper remediation. The immediate goal is to stop adding pressure while you collect evidence.
A useful first action is to retry one affected VM after a cooldown period, not the entire failed batch. If the single retry succeeds after the provider backlog clears, that strengthens the saturation theory. If the single retry still fails and provider status is offline or syncError, move toward registration, trust, or connectivity diagnosis.
Runbook Stage 2: Inspect vvold.log for VASA Pressure
On the affected ESXi host, review vvold.log.
# Run on the affected ESXi host grep -Ei "No free connections|PROVIDER_BUSY|bindVirtualVolume|createVirtualVolume|TimedOut|GetFreeConn" /var/run/log/vvold.log | tail -100
The strongest KB 318662 indicators are:
- Long elapsed time for
createVirtualVolume No free connections to VPPROVIDER_BUSYbindVirtualVolumetransient failure- VASA operations waiting for a free connection
- Outstanding operations reaching the provider connection limit shown in the log
KB 318662 includes examples where VASA provider connections are exhausted, VasaSession::GetFreeConn reports no free connections, and bindVirtualVolume returns transient provider-busy behavior.
Interpretation:
vvold.log Finding | Likely Meaning | Next Move |
|---|---|---|
No free connections to VP | Provider connection pool is saturated | Stop batch migrations and allow operations to drain |
PROVIDER_BUSY | Provider is unable to accept/complete current request load | Reduce concurrency and check provider appliance/array health |
Long createVirtualVolume time | Control-plane operation latency is high | Check backend array management plane and VASA appliance performance |
bindVirtualVolume retries | ESXi cannot complete binding quickly | Check provider pressure, provider version, and Protocol Endpoint state |
| Transport or connection refused errors | Provider service, firewall, or availability issue | Move to provider registration/connectivity checks |
Do not treat this as proof of array data-path latency by itself. vvold.log is showing control-plane behavior.
Runbook Stage 3: Check VASA Provider Registration and Sync State
A saturated VASA provider and a broken VASA provider are different problems.
From an affected ESXi host:
esxcli storage vvol vasaprovider list
A healthy provider should be online. Broadcom’s vVol troubleshooting guidance shows esxcli storage vvol vasaprovider list as a validation step and notes that a properly functioning VASA provider should have an online status; a syncError state indicates the VASA provider is not functioning correctly.
From vCenter:
vCenter Server > Configure > Storage Providers
Check:
- Provider status
- Sync status
- Certificate warnings
- Duplicate provider entries
- Provider URL or FQDN mismatch
- Recent provider re-registration or vCenter rebuild
- Whether both controller/provider endpoints are registered if your storage platform exposes multiple VASA providers
For PowerCLI-based inventory, use Get-VasaProvider -Refresh as a quick provider discovery step. Broadcom’s PowerCLI reference states that Get-VasaProvider retrieves the VASA providers currently registered with Storage Manager and supports a -Refresh parameter to synchronize providers before retrieving data.
# Requires an authenticated PowerCLI session to vCenter Connect-VIServer vcsa01.example.local # Provider object properties can vary by version/provider. # Start broad, then narrow the fields you want to report. Get-VasaProvider -Refresh | Format-List *
For a cleaner operational report after you know the available fields in your environment:
Get-VasaProvider -Refresh |
Select-Object Name, Id, Url |
Format-Table -AutoSize
This does not replace ESXi-side vvold.log analysis. It gives you a vCenter-side inventory view of registered providers.
Runbook Stage 4: Separate Provider Pressure from Protocol Endpoint Problems
This is where vVol troubleshooting often goes sideways.
The VASA provider handles awareness, policy, and management/control operations. Protocol Endpoints provide the access point ESXi uses for the data path to vVols. Broadcom’s vVol overview describes Protocol Endpoints as logical I/O proxies that ESXi uses to communicate with vVols and establish the data path on demand.
Check Protocol Endpoints from the host:
esxcli storage vvol protocolendpoint list
Broadcom documents this command as an issue validation step when troubleshooting inaccessible vVol datastores.
Decision point:
| Finding | More Likely Class | Operational Interpretation |
|---|---|---|
VASA provider online, PE accessible, No free connections in vvold.log | Provider saturation | Reduce migration concurrency and inspect provider/array management plane |
VASA provider offline or syncError | Provider registration, trust, certificate, or service issue | Fix provider registration/trust before retrying migrations |
| VASA provider online, PE inaccessible or not configured | Storage presentation / host mapping issue | Work with storage team to map PE correctly and rescan |
| VASA certificate expired or near expiry | Certificate lifecycle issue | Renew, refresh, or reauthenticate according to ownership model |
| Provider and PE healthy, no VASA pressure | Look elsewhere | Check vMotion network, host load, DRS, array data path, or VM-specific constraints |
Broadcom documents a case where a newly added vVol datastore shows inaccessible while the VASA provider is online because the vVol datastore is not connected on the ESXi host and Protocol Endpoint LUN presentation is missing from the backend array.
That is a different issue from KB 318662 provider connection exhaustion.
Runbook Stage 5: Reduce the Migration Blast Radius
If the evidence points to VASA provider pressure, the immediate remediation is not to keep retrying the same large batch.
Do this instead:
- Stop or pause noncritical migration waves.
- Exit or pause maintenance-mode evacuation if it is driving too many simultaneous migrations.
- Migrate a small number of VMs at a time from the affected host.
- Avoid stacking other VASA-heavy operations during the same window, such as clone storms, snapshot-heavy workflows, large policy changes, or mass provisioning.
- Watch
vvold.logwhile running a controlled test batch. - Increase batch size slowly only after the provider remains stable.
The point is not to find a universal concurrency number. VASA provider behavior is storage-vendor and version dependent. Treat the safe batch size as an environment-specific operational limit.
KB 318662’s resolution is intentionally vendor-aware: Broadcom recommends asking the storage partner whether a newer VASA provider version would help and checking load/performance on the backend storage array.
That is the right escalation path. The VASA provider is usually delivered by the storage vendor, and provider-side limits, queues, appliance sizing, failover behavior, and software defects vary by platform.
Operational Monitoring Signals to Add
For vVol environments, monitor the control plane like it is part of the production storage path.
| Signal | Source | Why It Matters | Suggested Response |
|---|---|---|---|
| Failed migration task count | vCenter Tasks / Events | Shows user-visible impact | Correlate failures to host, datastore, provider, and time window |
| Migration concurrency per source host | vCenter / automation logs | Identifies maintenance-mode or script-driven migration storms | Batch migrations and avoid uncontrolled drains |
PROVIDER_BUSY | ESXi vvold.log | Indicates provider pressure | Pause migrations and inspect provider/array health |
No free connections to VP | ESXi vvold.log | Strong signal of VASA connection exhaustion | Reduce concurrency and escalate with log evidence |
VASA provider Offline or syncError | vCenter Storage Providers / ESXi CLI | Registration, service, trust, or communication issue | Validate provider registration, certificate, and service health |
| Protocol Endpoint accessible/configured | vSphere Client / ESXi CLI | Confirms data-path presentation | Involve storage team if PE is missing or inaccessible |
| VASA certificate expiry | VCF Operations | Certificate expiry can break vCenter-to-provider communication | Renew/refresh/reauthenticate according to certificate ownership |
| Provider appliance CPU/memory/thread pool | Storage vendor tooling | Determines whether provider appliance is undersized or overloaded | Follow vendor sizing and upgrade guidance |
| Array management-plane latency | Storage vendor tooling | VASA may depend on array management APIs, not only data I/O | Check management-plane load during migrations |
VCF Operations can raise an alert when a vVol VASA provider certificate registered to vCenter is near expiration or expired. Broadcom states that if the certificate expires, communication between vCenter and the VASA Provider will fail, disrupting storage functionality and making vVol datastores unusable for provisioning operations.
That alert belongs on the same operational dashboard as migration failures and provider status.
What Not to Do First
When migrations are failing, pressure is high and the tempting fixes are often too broad.
Avoid these as first moves:
- Do not mass-retry every failed migration immediately.
- Do not unregister and re-register the VASA provider unless evidence points to registration or trust failure.
- Do not reboot ESXi hosts just because vVol operations are slow.
- Do not assume the array data path is the cause without checking VASA pressure.
- Do not enter another large maintenance-mode evacuation wave until the first failure pattern is understood.
- Do not ignore certificate warnings because “the datastore is still online.”
Provider re-registration may be appropriate for certain trust, FQDN, or certificate failures, but it is not the default fix for provider saturation. Broadcom documents cases where certificate or hostname trust issues require re-registration or re-authentication, but those cases have different evidence, such as provider offline status, hostname verification failures, or sync errors.
Validation After Remediation
After reducing concurrency, updating a provider, clearing provider backlog, or fixing provider registration, validate in layers.
First, validate provider health:
esxcli storage vvol vasaprovider list
Confirm the provider is online and no longer showing syncError.
Second, validate Protocol Endpoints:
esxcli storage vvol protocolendpoint list
Confirm the relevant Protocol Endpoints are accessible and configured.
Third, validate logs:
grep -Ei "No free connections|PROVIDER_BUSY|TimedOut|syncError" /var/run/log/vvold.log | tail -100
You want to see that new test migrations are no longer producing fresh provider-busy or connection-exhaustion entries.
Fourth, validate the workflow:
- Migrate one low-risk VM.
- Migrate a small batch.
- Monitor
vvold.logduring the batch. - Confirm no new generic migration failures appear.
- Increase batch size gradually only if the provider remains stable.
Finally, document the operational limit you observed. If five concurrent migrations caused VASA pressure but two ran cleanly, capture that. Your future maintenance-mode process should reflect the tested limit until the VASA provider version, storage firmware, or architecture changes.
Rollback and Fallback Guidance
If the issue returns during validation, stop the batch and preserve evidence.
Recommended fallback actions:
- Keep unaffected powered-on VMs running where possible.
- Cancel or pause nonessential migration tasks.
- Remove the host from maintenance workflow until a controlled drain plan is ready.
- Keep VMs on their current vVol datastore if the failure is migration-specific and production I/O is healthy.
- Escalate to the storage vendor with provider logs, ESXi
vvold.log, vCenter task IDs, timestamps, and array/provider performance data. - If provider registration or certificate trust is the issue, follow the vendor and Broadcom-supported reauthentication or re-registration process.
- If Protocol Endpoints are inaccessible, involve the storage team to validate host group mapping, array presentation, zoning, masking, and rescan requirements.
For escalation, include:
- vCenter task IDs and timestamps
- Source and destination host names
- VM names and datastores
vvold.logexcerpts around the failureesxcli storage vvol vasaprovider listoutputesxcli storage vvol protocolendpoint listoutput- VASA provider version
- Storage array firmware/software version
- VASA provider appliance CPU, memory, service, and queue metrics if available
- Backend array performance at the same timestamps
That package shortens the support conversation because it separates control-plane saturation from registration, certificate, Protocol Endpoint, and traditional storage data-path problems.
Architecture Caveats
There are three caveats worth making explicit.
First, not every vVol migration failure is KB 318662. The KB points to a specific pattern involving many vMotion operations, long swap vVol creation, and VASA provider connection exhaustion. Other failures can come from certificates, provider registration, Protocol Endpoint presentation, vCenter metadata loss, host connectivity, or array-side issues.
Second, do not generalize the provider connection count from one environment to all environments. KB examples show provider connections maxing out, but the practical limit depends on the storage vendor’s VASA implementation, provider version, array behavior, appliance sizing, and current load.
Third, VCF 9 planning changes the strategic conversation. If you are operating vVols on supported vSphere 8.x or VCF/VVF 5.x releases, this runbook is still useful. If you are planning VCF/VVF 9.x, treat vVols as a migration-away item, not a long-term design target, because Broadcom has announced deprecation beginning in VCF/VVF 9.0 and full discontinuation in VCF/VVF 9.3.0.
Conclusion
vVol migration failures are not always storage performance failures.
When many migrations are initiated from the same host, especially during maintenance-mode evacuation, the VASA provider can become the pressure point. The visible error may be generic, but the useful evidence is usually in vvold.log: long vVol creation times, provider-busy responses, bind retries, and no free VASA provider connections.
The operational response is straightforward:
- Stop increasing the migration storm.
- Confirm whether the issue is VASA pressure, provider registration, certificate trust, or Protocol Endpoint access.
- Validate with ESXi logs and provider status, not just vCenter task messages.
- Batch migrations conservatively.
- Work with the storage vendor on VASA provider version, sizing, and backend management-plane performance.
The deeper lesson is architectural: with vVols, the control plane is part of the storage service. If you do not monitor it, you will only see it when migrations fail.
External Sources
- Broadcom KB 318662 — Possible migration failures of virtual machines stored on vVols due to overloaded VASA providers: https://knowledge.broadcom.com/external/article/318662/possible-migration-failures-of-virtual-m.html
- Broadcom KB 323121 — Understanding Virtual Volumes vVols in VMware vSphere: https://knowledge.broadcom.com/external/article/323121/understanding-virtual-volumes-vvols-in-v.html
- Broadcom KB 401070 — Deprecation of VMware vSphere Virtual Volumes in VCF and VVF: https://knowledge.broadcom.com/external/article/401070/deprecation-of-vmware-vsphere-virtual-vo.html
- Broadcom KB 439686 — Alert: The certificate of the vVol VASA Provider registered to the vCenter Server is about to expire: https://knowledge.broadcom.com/external/article/439686/alert-the-certificate-of-the-vvol-vasa-p.html
- Broadcom KB 389601 — vVol datastore inaccessible after moving vCenter Server: https://knowledge.broadcom.com/external/article/389601/vvol-datastore-inaccessible-after-moving.html
- Broadcom KB 409865 — Newly added datastore shows inaccessible in vCenter: https://knowledge.broadcom.com/external/article/409865/newly-added-datastore-shows-inaccessible.html
- Broadcom KB 372508 — vVol datastore is inaccessible error message related to VASA provider trust or hostname verification: https://knowledge.broadcom.com/external/article/372508/vvol-datastore-is-inaccessible-error-me.html
- Broadcom PowerCLI Reference —
Get-VasaProvider: https://developer.broadcom.com/powercli/latest/vmware.vimautomation.storage/commands/get-vasaprovider
Raw Device Mappings tend to show up in the places where infrastructure history is still attached to the workload. A database server...