BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance

First session of the day Thursday — chose to go to this rather than “PAR3340 Selling VMware vCenter Site Recovery Manager 5 and VMware vSphere Replication” (I need to get comfy with the new SRM 5 licensing but figure I can do that from a PowerPoint).

I’ve double booked this with a session that starts in 30 minutes (Rethinking Storage for Virtual Desktops) so will see if I stay for the whole thing.

Summary at the top = very good deep-dive on HA

  • Speakers = Keith Farks (Senior Staff Engineer, vSphere HA) & Jim Chow (Staff Engineer, vSphere FT)
  • HA provides rapid recovery from outages, FT provides continuous availability.
    • Minutes of downtime = Infrastructure HA, Guest Monitoring HA, App Monitoring APIs (Partner Solutions)
    • No downtime = Fault Tolerance
  • Session will cover…
    • Technical overview of vSphere HA 5
    • Technical preview of vSphere FT 5
  • HA rewritten in 5.0 to…
    • simplify setting up HA clusters and managing them
      • HA agents use only IP addresses — no more DNS
      • Agents now pushed out and configured in parallel (rather than serially) — takes about 2 minutes total as opposed to 1 minute per host previously.
    • enable more flexible and larger HA deployments
      • previously problem with primary roles
      • primary/secondary roles now removed
      • still only support 32 hosts per cluster
    • make HA more robust/easier to troubleshoot
      • improved isolation response to lower possibility of having total cluster isolation event
      • new inter-agent communication mechanism (storage heartbeat I think…he hasn’t said yet)
      • more fine-grained HA host state to help with troubleshooting
    • support network partitions
  • HA 5.0 architecture is fundamentally different — 3 major things.
    • New vSphere HA Agent — called the Fault Domain Manager (FDM)
    • has all HA code — no longer included in vpxa agent
    • still use vCenter server to management cluster and failover operations are still independent of VC
    • HA traffic goes over management network
  • Key Concept 1 – FDM Roles and Responsibilities
    • FDM Master
      • One FDM is chosen to be the master.
        • Normally one master per cluster
        • All others assume the role of FDM slave
      • Any FDM can be chosen as master
        • No longer a primary/secondary role concept.
        • Selection done using an election.
      • Master-Specific responsibilities
        • Some others.
        • Manages persisted state
    • FDM Slave
      • Slave-specific respond
        • critical state chagnes to masters
        • restarts vas when directed by master
        • if master fails, slaves elect new master
      • Other slave stuff
        • monitors health of VMs running on host
        • implement VM/App monitoring features
    • Master Election
      • Election held when..
        • vSphere HA is enabled
        • when master’s host becomes inactive (maintenance, standby, reboot)
        • HA reconfigured on master’s host
        • management network partition occurs
      • If multiple masters can communicate, all but one master will abdicate.
      • Master-election algorithm
        • 15-25 seconds (varies depending on reason for election)
        • Elects participating host with the greatest number of mounted data stores
        • if tie break it using the host IDs assigned by vCenter
    • Agent Communication
      • FDMs communicate over the management network and data stores
      • data stores used when network is unavailable – hosts isolated or partition
      • Elections done via UDP and no broadcast
      • Master-slave communication is done via SSL-encrypted TCP
    • Questions answered by Datastore communication
      • Master
        • Is a slave partitioned or isolated?
        • Are its VMs running?
      • Slave
        • Is a master responsible for my VM?
      • Datastores used — selected by VC, called Heartbeat Datastores
    • Heartbeat Datastores
      • VC chooses (by default) 2 data stores per cluster.
      • Preference for VMFS over NFS.
      • Can override the selection or constrain it – “Edit Cluster” settings.
    • Responses to a network or host failures
      • Two criteria for master to declare host dead
        • Master can’t ping or communicate via network.
        • no storage heartbeats
      • Results in HA attempts to restart all VMs running on that host
    • Host is network isolated when…
      • sees no vSphere HA traffic
      • can’t ping the isolation addresses
      • Results in…
        • Host invokes (improved) isolation response…
        • Checks first if a master “owns” a VM
        • Applied if VM is owned or datastore is inaccessible
        • Default is now Leave Powered On
      • Master
        • restarts those VMs powered off or that fail later
  • Key Concept #2 – HA Protection and failure-response guarantees
    • HA protects against 5 types of failures
      • Reset VM type failures – require tools installed
        • Guest OS hangs, crashes
        • App heartbeats stop
      • Attempt VM restart – responding master knows VMs are HA protected
        • Host fails
        • Host Isolation (VM powered off)
        • VM fails (e.g. VM crashes)
    • HA Protected Workflow
      1. User issues Power on for VM
      2. Host powers on VM
      3. VC learns that the VM is powered on
      4. VC tells master to protect the VM
      5. Master receives directive from VCM
      6. Master writes fact to a file
      7. Write is done — if a failure after this point, attempt will be made for failures and now in the future.
    • For the earlier steps, HA may or may not try restart (depends on failure type).
  • Key Concept #3 — I must have missed this being called out…..covered above I think.
  • HA Wrapup — get the slides….more slides in the downloadable ones and also in speakers notes.
    • Get Duncan’s and Franklin’s HA book.
  • vSphere Fault Tolerance SMP Tech Preview…talking about….
    • Why Fault Tolerance?
      • Continuous availability (zero downtime, zero data loss, no loss of TCP connections, completely transparent to guest OS software).
    • What’s new with SMP
  • SMP Timeline
    • 2009 – FT Release in vSphere 4
    • 2010 – Updates to FT in 4.1
    • 2011 – More updates to FT in 5.0
    • Problem
      • FT only for uni-processor VMs
      • Is FT possible for multi-processor VMs?
        • Well….it’s a really hard problem.
        • Concerted effort to find approach.
      • Reached a recent milestone.
  • Overview of SMP FT vs. Uniprocessor FT
    • vLockstep between 2 single proc VMs with shared storage.
    • Had to take clean slate approach to SMP FT
    • LOT more data when dealing with SMP FT
      • New requirement for 10 GigE FT Logging link
      • Probably won’t be until next VMworld that we can do a deep dive.
      • No more vLockstep — rewritten from scratch…just calling it “SMP Protocol”
  • Demo
    • FT Logging NIC for (4) vCPU VM takes 60 megabytes per second.
    • “Oracle spawned a terrifying number of processes”.
    • He….I started the clapping after the successful SMP FT demo.
  • SMP FT in action
    • Client oblivious to FT operation
      • SwingBench client
      • SSH client
    • No workload disruption
  • FT Performance Numbers
    • Various workloads — from 55% to 80% of non-FT performance.
    • Similar config to vSphere 4 FT Performance Whitepaper
  • vSphere HA & FT Technical Directions
    • More comprehensive coverage of failures for more apps
      • Multiple vCPUs, Protection against component host failures
    • Broader set of enablers for improving app availability
      • More API building blocks for partners

2 thoughts on “BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s