- Oliver Shorey presenting – EMC CSE
- Tech overview, best practices, etc.
- Good session – went medium level technical.
More after the jump…
- Scenario overviews..
-
- vMotion for planned maintenance – we all understand this, right?
- DRS for Load Balancing – we all understand this too, right?
- HA if a host goes down – crash-consistent reboot
- FT is a server goes down – no VM downtime
- SRM for DR
-
- easy push button
- any topology and distance
- great when need a run book and/or auditing
- Core of VPLEX – multi-site data availability
- What’s new with VPLEX
-
- Free migrations for up to 180 days – don’t have to double license.
- Frame-based VNX licensing – per VNX frame.
- Reference architecture with VSPEX
- Proven Success – 200 PB deployed with 13 million runtime hours
- VAAI Support
- EMC Storage Analytics (coming)
- New Consulting Offering (CAAS engagement)
- Why VPLEX vs DR
-
- Save money $$
- No need for DR tests as running actively at DR site (or can at least)
- Evolution of Availability
- VPLEX Local HA – standard diagram….from the bottom up…
-
- Cluster or Virtual Host Layer
- Physical Host Layer
- Virtual Storage Layer (VPLEX)
- Physical Storage Layer
- Now stretching it across a WAN – and….it’s VPLEX Metro!
-
- Go across IP or FC WAN
- Present a distributed device – same LUN presented in both locations given low enough latency
- It’s magic time – vMotion between physical sites.
- VPLEX Witness is critical – prevents split brain. Independent voter to break deadlocks on who’s up or not.
- Cross-connect – can give ESX hosts in Site A a primary path to Site A VPLEX and also a secondary path to Site B VPLEX – resiliency against just a local storage or VPLEX failure.
- VPLEX does support FT but requires 1 ms latency between sites.
- VMware – VPLEX Best Practices
-
- Specific issues around APD & PDL – can overcome by using best practices
-
- All Paths Down – Persistent Device Loss
- All Paths Down – APD
-
- happens when datastore goes away without notifying ESXi server
- Pre ESXi 5.1 this would cause HostD process to hang on ESXi server
-
- VMs go into zombie state
- ESXi server becomes slow and unresponsive
- HA failover isn’t invoked
- Only way to fix was to restore the path or reboot the server
- Dual fabrics reduce the risk of this.
- New feature in 5.1 – APD timeout
-
- when APD is detected, 140 second timer starts (tuneable)
- after 140 seconds, I/O path is cut – none of previous issues (i.e. something happens automatically rather than APD requiring manual intervention)
- Persistent Device Loss – PDL
-
- PDL will cause HA failover if settings aren’t put in place beforehand.
- Post 5.0 Update 1 this is much better.
- Site Affinity – want to keep VM’s running at preferred location and not have HA start them at wrong site.
-
- DRS sub-clusters basically.
- HA works that way as well.
- VM restart priority – important to prevent boot storm between sites.
- FT over distance – like RAID1 but with datacenters
-
- Have to watch the secondary VM placement as DRS is challenged there.
- May need to vMotion to the right datacenter (that may be the same datacenter as the primary or a different datacenter than the primary).
- Mixing HA & FT
-
- If doing HA & FT, try to keep FT clusters to 2 nodes. HA clusters can go up to VMware limits.
- If have more than 2 nodes in federated FT cluster, have to manually police VM placement.
- vCenter Placement Options
-
- Within the stretched HA cluster
- Use vCenter Heartbeat
- At a third site with the VPLEX Witness
- Stretched Layer 2 Network – many technologies out there
-
- Cisco OTV, LISP, Brocade VPLS
- Adding DR to the mix – into 3rd site territory
-
- Using VPLEX at 100 mile distance + RP CRR for a 3rd site.
- RecoverPoint integration makes it array agnostic
- SRM for DR site
- VPLEX replication immediately replicates valid data and corrupt data
-
- Makes a good case for either RP CDP at one site (journal rollback) or RP CRR (remote site journal)
- Use cases by distance
-
- Sync – 1 ms – Metro
- Sync – 5 ms – Metro
- Sync – 10 ms – Metro
- Lots of VMware applicability
Not as deep as I’d hoped but good core overview…one I’ve done with customers myself actually.
Pingback: Think Meta » EMC World 2013 – Blog Posts Reference