Came into this session late…apologies for the rather fragmented notes.
- When Normal Behavior isn’t good….
- Probabilistic vs. absolute health model
- VC Ops has a domain-specific health model for vSphere
- Use absolute health noel until analytics’ probabilistic model is ready to generate DT’s
- 2 Goals of capacity management
- Efficiency — optimization of capacity.
- Predictability — availability of capacity.
- How to think about capacity
- Look at usable capacity, not total capacity (account for failover, extra buffer, HA failover, etc.).
- Think in terms of VMs…you deploy VMs — not CPU or memory.
- Deploy CPU/memory/disk/etc. in “VM-sized units”.
- Understanding behavior
- Understand weekly patterns — business week, weekend, workload spike at 9 AM on Mondays.
- Optimizing capacity
- Multiple levels — VMs, hosts, clusters
- Powered off and idle VMs
- Right-sizing — yes, please
- How do we determine an idle VM?
- Simple…when VM usage is less than a configured threshold.
- But…when is it idle overall? Time-based over xx% over xx time period.
- Two thresholds — VM idle or not, VM idle how much over last week/month/etc.
- Seriously down in the weeds stuff (in a good way)….walking through when VMs are idle and/or over utilized vs. underutilized.
- Summary
- Virtualization and cloud present new challenges.
- Fuild Capacity, invisible walls, separation of provider and consumer.
- Must move beyond manual monitoring and troubleshooting.
- Most performance management solutions today require extensive manual effort.
- Doesn’t scale well with cloud environments.
- One more point….
- Virtualization and cloud present new challenges.
A lot of deep down stuff that I didn’t blog as was difficult to turn into a good post….