CIM4280 VMware vCenter Operations in the Real World

(4) Speakers each talking for 15 minutes about vCenter Options

Customers speaking from Stanford School of Medicine, Kaiser Permante, and Maximus.

Initial Review of vCenter Operations Editions — Standard, Advanced, Enterprise.

Fletcher Cocquyt – Principal Engineer at Stanford University
- Environment
  - (310) VMs on (21) ESXi hosts.
  - 20 TB NFs datastores replicated between campus and DR site.
  - Networking – 75% complete 10 GigE upgrades for ESXi hosts
  - Monitoring 25,830 metrics monitored by static thresholds using Cacti, Zabbix, and big brother.
- Problems with static thresholds
  - Requires lots of manual configuration
  - Can cause lots of false positives due to bad thresholds/templates
  - Tuning requires lots of specific domain knowledge and time.
  - Tried to write his own system to figure out what mattered.
- Dynamic thresholds
- Conclusion = he really likes vCenter Operations Standard
  - I kind of spaced out here a bit as he was reading his presentation.
Maximus Presenter
- (15) ESX servers running (300) VMs
- 25 TB of FC storage in a 3 tier environment
- Previous monitoring tools = Uptime software, vFoglight (liked the dashboard but didn’t have enough depth), vKernel, CapacityIQ
- What they needed….
  - Improved communication and reporting perf stats to internal/external clients.
  - Improved visibility for internal engineers to our virtual environment
  - Better problem solving tools.
- Loves the dashboard….at a glance health, capacity, and load of the environment.
- Example of how it helped solve an application development issue (pointed to sockets left open), also alerted on a storage issue faster than the storage tools.
- CapacityIQ – better report utilization in CapIQ
  - Better Root Cause Analysis for clients
- Where are they now?
  - Running for 250 VMS
  - Become the main dashboard for monitoring the prod and dev virtual environments.
  - Cost Savings
    - No other licenses required for other previously used tools
    - Better resource management
    - Less time and expense required for troubleshooting client problems
    - Shorter turn-around time for new VM deployment
Ian Dodd, Kaiser Permanente
- Huge Environment
- 8 million members, 15,000 servers, 20 PB storage, virtualize first policy, large mainframe environments
- Many enterprise vendors – HP, IBM, VMware, Oracle, Quest, MSFT, In-House Developed
- Challenges — lots. Multiple tools, consoles, approaches to performance management, inconsistent approach to threshold management, high ticket volume due to threshold breaches, proactive approach is very challenging, automation not fully exploited, workload is ticket driven.
- 5 stage approach
  - Acquisition — apply best practices for metric gathering, changed to defining what metrics mattered to gather.
  - Availability — create single pane of glass for availability monitoring on automated alerting
  - Performance — established a specific 50 person Critical Application Support Team
  - Automation —
  - Visualization —
- This is mind-blowing stuff….mammoth environment and how to guarantee performance in it.
- Integations into Tivoli environment, NetCore, BMC Remedy, many more.
- 3 Year Program….so far….
  - Performance and Availability Consoles created
  - Some vCOPs in production
  - LOB owner said “this is most important tool in my tool belt”.
  - Alerts also contain likely resolution.
  - Alerts are routed to Level 3 people directly.

Think Meta

Exploring IT Layers

CIM4280 VMware vCenter Operations in the Real World

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply