Summary at the top = incredibly dense session on vSphere 5 performance….fantastic session.
Getting right into it…I love that in a session….
- Great performance maximums in vSphere 5 – 32 vCPUs, 1,000,000+ IOPs per host, etc.
- 92-97% of native performance up to 32 vCPUs
- CPU Scheduler Improvements — increases as much as 30%
- vNUMA – allows NUMA-aware guest OSes to use underlying NUMA hardware more efficiently.
- vCenter — 75% increase in management ops/minute.
- HA cluster – configure 9x faster than 4.1 for a 32 host ESX cluster, 60% more VMs failover in same time.
- Network I/O Control (NIOC)
- Traffic management for Distributed vSwitch
- Enhancements are User-defined network resource pools, New host based replication traffic type, QoS tagging.
- Example w/out NIOC how vMotion takes up bandwidth and impacts NFS, VM, & FT traffic.
- Example with NIOC showing vMotion having no impact.
- Can add QoS tags (i.e. 802.1p) so goes end to end in network environments.
- SplitRXMode
- greatly reduces packet loss
- new way of doing network packet receive processing in the vmkernel
- splitting the cost of receive packet processing to multiple contexts
- Before w/24 VMs could see up to 40% packet loss, with this now less than 10% percent.
- Enable on a per vmnic basis.
- Multicast improvements for throughput and efficiency.
- TCP IP Stack Improvement — higher throughput with small messages & better IOPs scaling for software iSCSI
- Netflow — supported in vSphere 5.
- Performance – monitor application performance over time
- Capacity Planning
- Visibility into Virtual Infrastructure Traffic
- vMotion — 25-30% improvement over 4.1, another 50% better using 2 NICs (multi-NIC vMotion)
- Storage vMotion — Live Migration & I/O Mirrorning
- Live Migration — not dirty block tracking anymore.
- Zero downtime maintenance.
- Manual and automatic storage load balancing.
- Live storage migration
- Live Migration — not dirty block tracking anymore.
- Memory Management
- Last year with 4.1 did wide NUMA support and memory compression.
- This year in 5.0 offering
- vNUMA – lets know about underlying NUMA so can use fastest memory (local to the processor the VM is running on).
- On by default for larger VMs — (8) vCPU or more.
- Not needed for smaller VMs as ESX scheduler already keeps smaller VMs in the same NUMA domain.
- Host Cache – use SSD storage as a cache location for VM swapped memory pages when under memory pressure.
- vNUMA – lets know about underlying NUMA so can use fastest memory (local to the processor the VM is running on).
- New hierarchy in VMware’s memory overcommit technology
- Transparent Page Sharing
- Ballooning
- Memory Compression
- Host Cache (using SSDs)
- Roughly 30% improvement in performance when used with other vSphere memory technologies.
- VMs vSwap file (as a last resort)
- Storage Improvements
- NFS support for Storage I/O Control
- Provides…..
- Increased Storage performance
- Limits performance fluctuations during periods of I/O congestion
- Increases throughput & decreases storage latency.
- Datastore clusters
- Group of data stores into a group.
- Can provision VMs into the datastore cluster.
- Storage DRS
- Requires datastore clusters and load balances between them based on capacity and I/O.
- What it does….
- Initial placement of VMs and VMDKs based on available space and I/O capacity.
- Load balancing between data stores in a datastore cluster via Storage vMotion based on storage space utilization.
- Load balancing via Storage vMotion based on I/O metrics, i.e. latency.
- Affinity/Anti-Affinity Rules for both VMs and VMDKs.
- VAAI — array integration
- New primitives = thin provisioning monitoring and dead space reclamation.
- 3 primitives last year are the big performance gains….but these are still really big deals for performance.
- Up to 95% time reduction! 😉
- Also saves CPU and memory resources on ESX host (so more ESX scalability).
- Dead Space Reclamation — VM not using space (like after a Storage vMotion). VAAI gives that back to the array via a standard SCSI command.
- 1,000,000 IOPs from a single vSphere server against an 8 engine VMAX array with a boatload of drives.
- 10 GigE FCoE is now on par with 8 Gig FC
- (4) vSphere hosts with dual 10 GigE Intel adapters
- VNX Array
- 16 streams of FCoE traffic
- 100% sequential, 100% read, 1 MB IO size
- 10 Gigabytes/second.
- HPC Performance
- Need high CPU counts, lots of memory, NUMA aware, etc.