EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance

Oliver Shorey presenting – EMC CSE
Tech overview, best practices, etc.
Good session – went medium level technical.

More after the jump…

Scenario overviews..
- vMotion for planned maintenance – we all understand this, right?
- DRS for Load Balancing – we all understand this too, right?
- HA if a host goes down – crash-consistent reboot
- FT is a server goes down – no VM downtime
SRM for DR
- easy push button
- any topology and distance
- great when need a run book and/or auditing
Core of VPLEX – multi-site data availability
What’s new with VPLEX
- Free migrations for up to 180 days – don’t have to double license.
- Frame-based VNX licensing – per VNX frame.
- Reference architecture with VSPEX
- Proven Success – 200 PB deployed with 13 million runtime hours
- VAAI Support
- EMC Storage Analytics (coming)
- New Consulting Offering (CAAS engagement)
Why VPLEX vs DR
- Save money $$
- No need for DR tests as running actively at DR site (or can at least)
Evolution of Availability
VPLEX Local HA – standard diagram….from the bottom up…
- Cluster or Virtual Host Layer
- Physical Host Layer
- Virtual Storage Layer (VPLEX)
- Physical Storage Layer
Now stretching it across a WAN – and….it’s VPLEX Metro!
- Go across IP or FC WAN
- Present a distributed device – same LUN presented in both locations given low enough latency
- It’s magic time – vMotion between physical sites.
- VPLEX Witness is critical – prevents split brain. Independent voter to break deadlocks on who’s up or not.
- Cross-connect – can give ESX hosts in Site A a primary path to Site A VPLEX and also a secondary path to Site B VPLEX – resiliency against just a local storage or VPLEX failure.
VPLEX does support FT but requires 1 ms latency between sites.
VMware – VPLEX Best Practices
- Specific issues around APD & PDL – can overcome by using best practices
- - All Paths Down – Persistent Device Loss
- All Paths Down – APD
- - happens when datastore goes away without notifying ESXi server
  - Pre ESXi 5.1 this would cause HostD process to hang on ESXi server
  - - VMs go into zombie state
    - ESXi server becomes slow and unresponsive
    - HA failover isn’t invoked
    - Only way to fix was to restore the path or reboot the server
  - Dual fabrics reduce the risk of this.
  - New feature in 5.1 – APD timeout
  - - when APD is detected, 140 second timer starts (tuneable)
    - after 140 seconds, I/O path is cut – none of previous issues (i.e. something happens automatically rather than APD requiring manual intervention)
- Persistent Device Loss – PDL
- - PDL will cause HA failover if settings aren’t put in place beforehand.
  - Post 5.0 Update 1 this is much better.
- Site Affinity – want to keep VM’s running at preferred location and not have HA start them at wrong site.
- - DRS sub-clusters basically.
  - HA works that way as well.
  - VM restart priority – important to prevent boot storm between sites.
- FT over distance – like RAID1 but with datacenters
- - Have to watch the secondary VM placement as DRS is challenged there.
  - May need to vMotion to the right datacenter (that may be the same datacenter as the primary or a different datacenter than the primary).
- Mixing HA & FT
- - If doing HA & FT, try to keep FT clusters to 2 nodes. HA clusters can go up to VMware limits.
  - If have more than 2 nodes in federated FT cluster, have to manually police VM placement.
- vCenter Placement Options
- - Within the stretched HA cluster
  - Use vCenter Heartbeat
  - At a third site with the VPLEX Witness
- Stretched Layer 2 Network – many technologies out there
- - Cisco OTV, LISP, Brocade VPLS
Adding DR to the mix – into 3rd site territory
- Using VPLEX at 100 mile distance + RP CRR for a 3rd site.
- RecoverPoint integration makes it array agnostic
- SRM for DR site
VPLEX replication immediately replicates valid data and corrupt data
- Makes a good case for either RP CDP at one site (journal rollback) or RP CRR (remote site journal)
Use cases by distance
- Sync – 1 ms – Metro
- Sync – 5 ms – Metro
- Sync – 10 ms – Metro
- Lots of VMware applicability

Not as deep as I’d hoped but good core overview…one I’ve done with customers myself actually.

Think Meta

Exploring IT Layers

EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance

One thought on “EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance”

Leave a comment Cancel reply

Share this:

One thought on “EMC World – Session Notes – VPLEX: Advanced Technology Deep Dive for VMware HA & FT Over Distance”

Leave a comment Cancel reply