EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange

Summary = high-energy walkthrough by Scott Lowe around the promise of stretched clusters, the high level details (part 1 below), and the low level details (part 2). Fantastic run-through of a fascinating subject (fascinating to me at least as I’m actively having SRM + VPLEX discussions with customers).

Scott Lowe – Stretched Cluster Discussion

Part 1: Stretched Cluster or SRM?
- vMSC = vSphere Metro Stretched Cluster
- Introduces some new terms:
  - Uniform access = “stretched SAN” – one storage array basically with access back across to another side.
  - Non-uniform access = “distributed virtual storage” – VPLEX basically….no one else doing this.
    - EMC worked with VMware to create this category…it’s the reason VPLEX is the first and still only on this list.
- Provides boundaries
- RTO vs. RPO — critical to determining solution.
  - RPO of near zero = need some kind of synchronous solution
  - RPO of minutes to hours = async stuff.
- DR verus DA
  - DA = Disaster Avoidance
    - Seeks to protect apps b/4 a disaster occurs.
    - How often do you know before a disaster is going to occur?
    - Similar to vMotion – have to have both ESX hosts (aka both sites) up for a DA solution.
  - DR = Disaster Recovery
    - Seeks to recover applications and data after a disaster occurs
  - Think of DA as vMotion and DR as vSphere HA
- SRM Details
  - Some form of storage replication
  - Layer 3 Connectivity
  - No minimum bandwidth requirements – purely driven by SLA/RPO/RTO
  - No max latency between sites – purely driven by SLA/RPO/RTO
  - At least (2) vCenter Server instances
- Requirements for vMSC
  - Some form of supported sync active/active storage architecture.
    - Must be read/write on both ends — traditional replication is read/write + read/only at destination.
  - Stretched Layer 2 Connectivity – as vMotion has some IP # at destination as source.
  - 622 Mbps bandwidth (minimum) between sites
  - Less than 5 ms latency between sites (10 ms with vSphere 5 Enterprise Plus/Metro vMotion)
    - This is roundtrip time without factoring in replication traffic.
  - A single vCenter Server instance (prob want to protect vCenter though with vCenter Heartbeat)
    - This is b/c we can’t vMotion between vCenter instances.
- Advantages for SRM
  - Defined startup orders (with prerequisites) – db server first, then web server, then app server, etc. etc.
  - No need for stretched Layer 2 connectivity (but supported)
  - The ability to simulate workload mobility without affecting production
  - Supports multiple vCenter Server instances (including in Linked Mode)
- Advantages of vMSC
  - Possibility of non-disruptive workload migration (disaster avoidance)
    - Lots of gating factors though.
  - No need to deal with issues around IP address changes.
  - Potenial for running active/active data centers and more easily balancing workloads between them
  - Typically a near-zero RPO with RTO of minutes
    - Lots and lots of caveats
  - Requires only a single vCenter server instance.
- Disadvantages of SRM
  - Typically higher RPO/RTO and vMSC
  - Workload mobility is always disruptive (requires reboot)
  - Requires at least (2) vCenter Server instances
  - Operational overhead from managing protection groups and protection plans.
    - Have to place VM’s on data stores, etc. based on dependencies, etc.
- Disadvantages of vMSC
  - Greater physical networking complexity due to stretched Layer 2 connectivity requirement
  - Greater cost resulting from higher-end networking equipment, more bandwidth, active/active storage solution.
  - No ability to test workload mobility – this matters – can’t test a vMotion…just try it and see what happens
  - Operational overhead from management of DRS host affinity groups.
  - Supports only a single vCenter server instance.
- What about a mixed architecture?
  - It can be done, but it has its own design considerations.
  - For any given workload, it’s an “either/or” situation.
  - Varrow has done this — not sure how many other partners actually have.
Part 2: Building Stretched Clusters – First time ever presented.
- vSphere Recommendations
  - Use vSphere 5 – eliminates some HA limitations (eliminates primaries and secondaries), introduces the vMSC HCL category.
  - Use vSphere DRS host affinity groups – can mimic site awareness, use PowerCLI to address manageability concerns, use “should” rules rather than “must” rules.
  - Using PowerCLI with host affinity groups – add a unique property to “group” VMs, use this “grouping to automate VM placement into groups, run the PowerCLI script regularly to ensure correct group assignment.
    - Have to turn on admission control to always keep half the hosts reserved.
- Storage Recommendations
  - Use storage from vMSC category – only VPLEX right now….what a bummer.
  - Be aware of storage performance considerations – know how Reads and Writes will be impacted.
  - Account for storage availability.
  - Plan Storage DRS carefully.
  - Use profile-drive storage – VASA.
  - Examples – MetroCluster – vMotion to 2nd site and reads/writes go back across WAN until Storage vMotion
  - Example – VPLEX in non-uniform mode – reads are always serviced locally although writes may still go across WAN.
  - Example – VPLEX in Metro
  - Storage Availability
    - Know how things are impacted during any failure.
    - Consider cross-connect topology.
    - Ensure multiple storage controllers at each site for availability.
    - Provide redundant and independent inter-site storage connections.
    - With VPLEX, use the third-site cluster witness (needs to be in separate failure domain).
  - Storage DRS Cautions
    - Align datastore boundaries to site/array boundaries.
    - Don’t combine stretched/non-stretched datastores
    - Understand impact of SDRS on overall storage solution.
  - Use Profile-driven storage — this is VASA
    - Keep things profile-driven, can help avoid operational concerns with VM placement.
- Networking Recommendations
  - Plan for different traffic patterns – we’re talking trombones here.
    - Look at OTV, LISSP if you haven’t already.
  - Where possible, separate management traffic onto a vSwitch.
  - Incorporate redundant and independent inter-site network connections.
  - Minimize latency as MUCH as possible.
- Operational Recommendations
  - Account for backup/restore in your design — many people overlook.
    - Where do tapes sit? Running Avamar?
    - If don’t duplicate backup topologies, really want to look at client-side dedup to reduce WAN traffic.
    - Mechanism to reduce restore traffic would be nice as well.
    - Might be able to use storage solution itself for restores – restore to local side, allow storage to replicate to remote side.
  - Handle inter-site vMotion carefully – it’s new coolness but introduces operational concerns
    - Will impact DRS host affinity rules.
    - Could require storage config updates
      - Reconcile DRS host affinity rules and VM locations
      - Reconcile storage availability and VM locations
      - Impact on other operational areas.
    - Do we need to notify other people in the org as VM’s move between data centers?
    - Look at monitoring, backups, IT staff, etc.
  - Don’t split multi-tier apps.
- From audience – UCS Express is great use case for VPLEX witness.

Think Meta

Exploring IT Layers

EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange

One thought on “EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange”

Leave a comment Cancel reply

Share this:

One thought on “EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange”

Leave a comment Cancel reply