Summary at the top = it’s easy to forget how incredible vMotion is…good review and then deep-dive on vMotion and recent improvements plus best practices.
Over 5x performance improvement in some areas in vSphere 5.
- So what is vMotion?
- Something that we love
- completely transparent to the guest
- invaluable tool to admins (avoid server downtime, allow troubleshooting
- provide flexibility
- Key enabler of DRS, DPM, FT
- What needs to be Migrated?
- Moving entire VM state
- Uses “checkpoint” infrastructure
- Look at all VM’s virtual devices and serialize their state into a blog, transfer it, deserialize it at the destination.
- Serialization is around 8 MB
- But..not quite that simple due to associations with physical resources.
- Devices = Processor, Device State (CPU, network, SVGA, etc.)
- Disk — shared storage required of course.
- Network — reverse ARP of course.
- Memory — pre-copy VM while VM is running….memory is the coolest thing here.
- Naive Memory Copy — just suspend the VM, move the memory, unsuspend
- Not good….64 GB VM requires 51 seconds on 10 GigE….much more
- Not mentioned but shades of Hyper-V original implementation.
- Instead VM runs during the vast majority of vMotion
- Ititerative memory “pre-copy” in theory goes until no outstanding memory
- Not good….64 GB VM requires 51 seconds on 10 GigE….much more
- Memory Iteratative Pre-copy
- First Phase, ‘Trace Phase/HEAT Phase’
- Send the VM’s ‘cold’ pages from source to destination.
- Trace all the VM’s memory….so know when future pages change.
- Performance impact: noticeable brief drop in throughput due to trace installation, related to memory size.
- Subsequent Phases
- Keep passing over memory and tracing each page as transmitted.
- Performance impact: minimal on guest performance
- Switch-over phase
- Once pre-copy has converged, very few dirty pages remain.
- VM is momentarily quiesced for switch-over.
- Performance impact: increase of latency as the guest is stopped, duration less than a second.
- First Phase, ‘Trace Phase/HEAT Phase’
- vMotion in 4.1….it’s a bit more than that.
- What if VM is dirtying faster than can transfer memory?
- 4.1 did RDPI, i.e. quick resume. You’d fail over even through pre-copy didn’t finish.
- Memory not transferred yet would be pulled remotely from the source host even though VM was running on the destination host.
- Not the best approach for performance so rewrote for ESX 5.
- What if VM is dirtying faster than can transfer memory?
- vSphere 5 Performance Enhancements – read this section
- Memory pre-copy
- lower impact when installing memory traces
- optimized to handle 10 GigE, can now fully saturate 10 GigE
- Multi NIC enhancements to further reduce pre-copy time.
- New feature SDPS (stun during page-send) kicks in during pathological cases
- Better than RDPI and handles pre-copy convergence failures better than 4.1
- Basically am forcing pre-copies to converge.
- Will introduce microsecond delays into vCPU just enough so that network transmit rate will climb above VM’s rate of dirtying memory.
- Much, much better than RDPI — lower performance yes but better than RDPI and can guarantee higher levels of performance.
- Copy remainder of memory from source to destination
- Improvements to reduce duration and impact on guest during switch-over phase.
- RDPI is disabled entirely in favor of SDPS.
- Memory pre-copy
- Test configuration with vSphere 5.
- 2 Nehalem hosts, 2 sockets, quad-core Xeon, 96 GB memory
- Three 10 GigE NICs, one for client, 2 for client traffic.
- How to measure vMotion performance
- Resource usage, Total Duration, Switch-over Time
- Performance impact on applications running inside the guest.
- App latency and throughput during vMotion
- Time to resume during normal level of performance.
- Testing Workloads — everything pretty much.
- Web (SPECweb2005), Email (Exchange 2010), DB/OLTP (SQL Server 2010), VDI/Cloud-Oriented
- There’s a vMotion Migration ID per VM — can use that to go look in the vmkernel logs and see a ton of info (starting, amount through the pre-copy, cutting over, etc.).
- Test Results
- 37% drop in vMotion duration in vSphere for web work load (30 seconds down to 18 seconds).
- All of this with 12,000 web sessions generating 6 Gbps web traffic.
- No network connections dropped during vMotion.
- Minimal performance impact during memory trace install of vMotion.
- vMotion performance on GigE vs. 10 GigE
- Almost a 10x improvement using 10 GigE vs. GigE
- Seriously considering switching to 10 GigE for vMotion network.
- GigE vMotion on vSphere 4.1 could lead to network connection drops due to memory copy convergence issues.
- On vSphere 5 even a pathological workload does not cause network connection drops during vMotion.
- Database workloads
- 35% reduction in vMotion time
- 2.3x improvement on vSphere 5 when using multiple NICs.
- Similar performance improvement during memory trace installation.
- VDI Workload
- Time to evacuate 64 VMs dropped from 11 minutes to 2 minutes.
- More super graphs.
- Best Practices
- Switch to a 10 GigE vMotion Network
- Consider using multiple 10 GigE NICs for vMotion
- Configure them all under same vSwitch.
- Configure each vmknic to use a separate vmnic as its active vmnic (rest marked as standby).
- vMotion will transparently fail over.
- If concerned about vMotion performance…..
- Consider placing VM swap files on shared storage (SAN or NAS).
- Using host-local swap or leveraging SSD for swap cache can impact vMotion performance (as means there’s more to transfer).
- Use ESX clusters composed of matching NUMA architectures when using vNUMA features.
- vNUMA topology of the VM is set during the power-on base on the NUMA topology of physical host.
- vMotion to a host with a different NUMA topology may result in reduced performance.
- When using CPU reservations, leave some slack….
- 30% of a CPU unreserved at host level.
- 10% of CPU capacity unreserved at cluster level.
- Conclusions