EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange

Summary = high-energy walkthrough by Scott Lowe around the promise of stretched clusters, the high level details (part 1 below), and the low level details (part 2). Fantastic run-through of a fascinating subject (fascinating to me at least as I’m actively having SRM + VPLEX discussions with customers).

Scott Lowe – Stretched Cluster Discussion

  • Part 1: Stretched Cluster or SRM?
    • vMSC = vSphere Metro Stretched Cluster
    • Introduces some new terms:
      • Uniform access = “stretched SAN” – one storage array basically with access back across to another side.
      • Non-uniform access = “distributed virtual storage” – VPLEX basically….no one else doing this.
        • EMC worked with VMware to create this category…it’s the reason VPLEX is the first and still only on this list.
    • Provides boundaries
    • RTO vs. RPO — critical to determining solution.
      • RPO of near zero = need some kind of synchronous solution
      • RPO of minutes to hours = async stuff.
    • DR verus DA
      • DA = Disaster Avoidance
        • Seeks to protect apps b/4 a disaster occurs.
        • How often do you know before a disaster is going to occur?
        • Similar to vMotion – have to have both ESX hosts (aka both sites) up for a DA solution.
      • DR = Disaster Recovery
        • Seeks to recover applications and data after a disaster occurs
      • Think of DA as vMotion and DR as vSphere HA
    • SRM Details
      • Some form of storage replication
      • Layer 3 Connectivity
      • No minimum bandwidth requirements – purely driven by SLA/RPO/RTO
      • No max latency between sites – purely driven by SLA/RPO/RTO
      • At least (2) vCenter Server instances
    • Requirements for vMSC
      • Some form of supported sync active/active storage architecture.
        • Must be read/write on both ends — traditional replication is read/write + read/only at destination.
      • Stretched Layer 2 Connectivity – as vMotion has some IP # at destination as source.
      • 622 Mbps bandwidth (minimum) between sites
      • Less than 5 ms latency between sites (10 ms with vSphere 5 Enterprise Plus/Metro vMotion)
        • This is roundtrip time without factoring in replication traffic.
      • A single vCenter Server instance (prob want to protect vCenter though with vCenter Heartbeat)
        • This is b/c we can’t vMotion between vCenter instances.
    • Advantages for SRM
      • Defined startup orders (with prerequisites) – db server first, then web server, then app server, etc. etc.
      • No need for stretched Layer 2 connectivity (but supported)
      • The ability to simulate workload mobility without affecting production
      • Supports multiple vCenter Server instances (including in Linked Mode)
    • Advantages of vMSC
      • Possibility of non-disruptive workload migration (disaster avoidance)
        • Lots of gating factors though.
      • No need to deal with issues around IP address changes.
      • Potenial for running active/active data centers and more easily balancing workloads between them
      • Typically a near-zero RPO with RTO of minutes
        • Lots and lots of caveats
      • Requires only a single vCenter server instance.
    • Disadvantages of SRM
      • Typically higher RPO/RTO and vMSC
      • Workload mobility is always disruptive (requires reboot)
      • Requires at least (2) vCenter Server instances
      • Operational overhead from managing protection groups and protection plans.
        • Have to place VM’s on data stores, etc. based on dependencies, etc.
    • Disadvantages of vMSC
      • Greater physical networking complexity due to stretched Layer 2 connectivity requirement
      • Greater cost resulting from higher-end networking equipment, more bandwidth, active/active storage solution.
      • No ability to test workload mobility – this matters – can’t test a vMotion…just try it and see what happens
      • Operational overhead from management of DRS host affinity groups.
      • Supports only a single vCenter server instance.
    • What about a mixed architecture?
      • It can be done, but it has its own design considerations.
      • For any given workload, it’s an “either/or” situation.
      • Varrow has done this — not sure how many other partners actually have.
  • Part 2: Building Stretched Clusters – First time ever presented.
    • vSphere Recommendations
      • Use vSphere 5 – eliminates some HA limitations (eliminates primaries and secondaries), introduces the vMSC HCL category.
      • Use vSphere DRS host affinity groups – can mimic site awareness, use PowerCLI to address manageability concerns, use “should” rules rather than “must” rules.
      • Using PowerCLI with host affinity groups – add a unique property to “group” VMs, use this “grouping to automate VM placement into groups, run the PowerCLI script regularly to ensure correct group assignment.
        • Have to turn on admission control to always keep half the hosts reserved.
    • Storage Recommendations
      • Use storage from vMSC category – only VPLEX right now….what a bummer.
      • Be aware of storage performance considerations – know how Reads and Writes will be impacted.
      • Account for storage availability.
      • Plan Storage DRS carefully.
      • Use profile-drive storage – VASA.
      • Examples – MetroCluster – vMotion to 2nd site and reads/writes go back across WAN until Storage vMotion
      • Example – VPLEX in non-uniform mode – reads are always serviced locally although writes may still go across WAN.
      • Example – VPLEX in Metro
      • Storage Availability
        • Know how things are impacted during any failure.
        • Consider cross-connect topology.
        • Ensure multiple storage controllers at each site for availability.
        • Provide redundant and independent inter-site storage connections.
        • With VPLEX, use the third-site cluster witness (needs to be in separate failure domain).
      • Storage DRS Cautions
        • Align datastore boundaries to site/array boundaries.
        • Don’t combine stretched/non-stretched datastores
        • Understand impact of SDRS on overall storage solution.
      • Use Profile-driven storage — this is VASA
        • Keep things profile-driven, can help avoid operational concerns with VM placement.
    • Networking Recommendations
      • Plan for different traffic patterns – we’re talking trombones here.
        • Look at OTV, LISSP if you haven’t already.
      • Where possible, separate management traffic onto a vSwitch.
      • Incorporate redundant and independent inter-site network connections.
      • Minimize latency as MUCH as possible.
    • Operational Recommendations
      • Account for backup/restore in your design — many people overlook.
        • Where do tapes sit? Running Avamar?
        • If don’t duplicate backup topologies, really want to look at client-side dedup to reduce WAN traffic.
        • Mechanism to reduce restore traffic would be nice as well.
        • Might be able to use storage solution itself for restores – restore to local side, allow storage to replicate to remote side.
      • Handle inter-site vMotion carefully – it’s new coolness but introduces operational concerns
        • Will impact DRS host affinity rules.
        • Could require storage config updates
          • Reconcile DRS host affinity rules and VM locations
          • Reconcile storage availability and VM locations
          • Impact on other operational areas.
        • Do we need to notify other people in the org as VM’s move between data centers?
        • Look at monitoring, backups, IT staff, etc.
      • Don’t split multi-tier apps.
    • From audience – UCS Express is great use case for VPLEX witness.

One thought on “EMC Bootcamp Notes – Stretched Cluster Discussion by Scott Lowe – VMware Partner Exchange

  1. Pingback: Think Meta » Closing Thoughts – VMware Partner Exchange

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s