It feels like I cannot go a week without being asked the question of, “What is my real capacity on vSAN/VxRail?” Followed by we just cannot get our capacity planning correct and we are determining how much more capacity to grow our cluster by. I thought I would perform a back to basic vSAN math article to help all those struggling with capacity questions. I know this article doesn’t cover Swap Files but that is a topic worth of it’s own article.
In a previous article I wrote about protection methods and their cost on storage.
Raid1 = 200%
Raid5 = 133%
Raid6 = 150%
Slack space is 30%
Let us now take the example of a 100GB VM where we want to determine what the total capacity will be after data protection.
100×2=200GB after Raid1
100×1.33=133GB after Raid5
100×1.5=150GB after Raid6
If you know you will be adding 50 more VMs of this size and half of them require Raid1 and the other half require Raid5, then the math is easy: 25×200=5,000 + 25×133=3,325 = 5,000+3,325=8,325GB.
Your 50 VMs will consume 8.32TB of additional capacity.
Let us take the example of we have 10,000GB Raw and we want to determine our usable space for a net new cluster.
Slack space = 10,000x.7=7TB remaining
7,000/2=3.5TB if all Raid1
7,000/1.33=5.2TB if all Raid5
7,000/1.5=4.66TB if all Raid6
We want to honor slack space plus account for a host failure on a net new cluster.
10 hosts each with 1TB Raw capacity
10-1=9 hosts remaining
9×1,000=9TB remaining raw
9,000x.7=6,300.0 Slack space
6,300/2=3.15TB if using Raid1
6,300/1.33=4.73TB if using Raid5
6,300/1.5=4.2TB if using Raid6
Dedupe + Compression
After posting this article I got flooded with but what about DD&C questions. Here are my personal best practices.
**Dedupe & Compression have overhead and will impact performance
vSAN Dedup/Compression is at disk group level not global. As a result, do not expect 2:1, 4:1, or 30:1.
**as you add more disk groups you will be spreading the workloads across more cache drives lowering your ratio.
**there are a lot of documentation out there around 1.5:1 ratio
My personal best practice to be conservative is a 1.2:1 ratio.
Take the earlier example of 10,000GB with Raid1 policy
7,000/2=3.5TB Raid 1
3.5×1.2=4.2TB logical after DD&C savings
Keys to remember, it is just a simple percentage formula. We will always need to honor slack space and account for a 30% loss in our raw capacity at all times. From there it is a decision of what level of Raid do we want to apply and the amount of storage it will consume. I know I didn’t cover stretched clusters but that adds another level of calculations and I wanted to keep the math simple on this post.