EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance

  • Oliver Shorey presenting – EMC CSE
  • Tech overview, best practices, etc.
  • Good session – went medium level technical.

More after the jump…

  • Scenario overviews..
    • vMotion for planned maintenance – we all understand this, right?
    • DRS for Load Balancing – we all understand this too, right?
    • HA if a host goes down – crash-consistent reboot
    • FT is a server goes down – no VM downtime
  • SRM for DR
    • easy push button
    • any topology and distance
    • great when need a run book and/or auditing
  • Core of VPLEX – multi-site data availability
  • What’s new with VPLEX
    • Free migrations for up to 180 days – don’t have to double license.
    • Frame-based VNX licensing – per VNX frame.
    • Reference architecture with VSPEX
    • Proven Success – 200 PB deployed with 13 million runtime hours
    • VAAI Support
    • EMC Storage Analytics (coming)
    • New Consulting Offering (CAAS engagement)
  • Why VPLEX vs DR
    • Save money $$
    • No need for DR tests as running actively at DR site (or can at least)
  • Evolution of Availability
  • VPLEX Local HA – standard diagram….from the bottom up…
    • Cluster or Virtual Host Layer
    • Physical Host Layer
    • Virtual Storage Layer (VPLEX)
    • Physical Storage Layer
  • Now stretching it across a WAN – and….it’s VPLEX Metro!
    • Go across IP or FC WAN
    • Present a distributed device – same LUN presented in both locations given low enough latency
    • It’s magic time – vMotion between physical sites.
    • VPLEX Witness is critical – prevents split brain. Independent voter to break deadlocks on who’s up or not.
    • Cross-connect – can give ESX hosts in Site A a primary path to Site A VPLEX and also a secondary path to Site B VPLEX – resiliency against just a local storage or VPLEX failure.
  • VPLEX does support FT but requires 1 ms latency between sites.
  • VMware – VPLEX Best Practices
    • Specific issues around APD & PDL – can overcome by using best practices
      • All Paths Down – Persistent Device Loss
    • All Paths Down – APD
      • happens when datastore goes away without notifying ESXi server
      • Pre ESXi 5.1 this would cause HostD process to hang on ESXi server
        • VMs go into zombie state
        • ESXi server becomes slow and unresponsive
        • HA failover isn’t invoked
        • Only way to fix was to restore the path or reboot the server
      • Dual fabrics reduce the risk of this.
      • New feature in 5.1 – APD timeout
        • when APD is detected, 140 second timer starts (tuneable)
        • after 140 seconds, I/O path is cut – none of previous issues (i.e. something happens automatically rather than APD requiring manual intervention)
    • Persistent Device Loss – PDL
      • PDL will cause HA failover if settings aren’t put in place beforehand.
      • Post 5.0 Update 1 this is much better.
    • Site Affinity – want to keep VM’s running at preferred location and not have HA start them at wrong site.
      • DRS sub-clusters basically.
      • HA works that way as well.
      • VM restart priority – important to prevent boot storm between sites.
    • FT over distance – like RAID1 but with datacenters
      • Have to watch the secondary VM placement as DRS is challenged there.
      • May need to vMotion to the right datacenter (that may be the same datacenter as the primary or a different datacenter than the primary).
    • Mixing HA & FT
      • If doing HA & FT, try to keep FT clusters to 2 nodes. HA clusters can go up to VMware limits.
      • If have more than 2 nodes in federated FT cluster, have to manually police VM placement.
    • vCenter Placement Options
      • Within the stretched HA cluster
      • Use vCenter Heartbeat
      • At a third site with the VPLEX Witness
    • Stretched Layer 2 Network – many technologies out there
      • Cisco OTV, LISP, Brocade VPLS
  • Adding DR to the mix – into 3rd site territory
    • Using VPLEX at 100 mile distance + RP CRR for a 3rd site.
    • RecoverPoint integration makes it array agnostic
    • SRM for DR site
  • VPLEX replication immediately replicates valid data and corrupt data
    • Makes a good case for either RP CDP at one site (journal rollback) or RP CRR (remote site journal)
  • Use cases by distance
    • Sync – 1 ms – Metro
    • Sync – 5 ms – Metro
    • Sync – 10 ms – Metro
    • Lots of VMware applicability
Not as deep as I’d hoped but good core overview…one I’ve done with customers myself actually. 

One thought on “EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance

  1. Pingback: Think Meta » EMC World 2013 – Blog Posts Reference

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s