If you work your way through the vCSHB Reference Guide you’ll have covered every objective in the VCAP-DCA blueprint, so that’s where I’d recommend you start. If you have time, view the VMworld sessions for a bit of background and reinforcement. I went into a bit more detail on this objective as it’s something I wanted to evaluate for my company, so there’s some ‘real world’ issues covered which I doubt you’ll need for the exam.
- Identify the five protection levels for vCenter Server Heartbeat
- Identify the three server protection options for vCenter Server Heartbeat
- Identify supported cloning options
Skills and Abilities
- Install and configure vCenter Server Heartbeat
- Determine use cases for and execute a manual switchover
- Recover from a failover
- Monitor vCenter Server Heartbeat and communication status
- Configure heartbeat settings
- Configure shutdown options
- Configure application protection
- Add/Edit Services
- Add/Edit Tasks
- Edit/Test Rules
- Install/Edit Plug‐ins
- Add/Remove Inclusion/Exclusion Filters
- Perform Full System and Full Registry checks
- Configure/Test Alerts
- Troubleshoot common vCenter Server Heartbeat error conditions
Tools & learning resources
It was a typical Friday. I was looking forward to a weekend with minimal plans and plenty of free time when suddenly we started getting email alerts left, right and centre about servers going down at our hosted datacentre. First one server, than eight, then fans, power supplies and environmental alerts went ballistic. There goes the weekend I thought…
It turned out that heavy rains has caused a leak in the roof at our datacentre (bad hosting company, go stand in the corner), resulting in water falling onto one of our production (isn’t it always?) HP bladecentres. Electronics and water obviously don’t mix well but the HP hardware managed surprisingly well. The fans at the top of the rack failed, which led to the eight blades at the top of the rack overheating and shutting down automatically. That probably saved the data and the blade hardware.
So where does Oracle licencing fit into this? Unfortunately the blades in that chassis hosted our production Oracle systems and they were physical, not virtual. This was largely due to Oracle’s infamous support stance on VMware as we run most other systems virtually. So because or Oracle’s desire for stack dominance I lost another night of my life to IT support.
Our recovery plan was to relocate the blades to a nearby rack which luckily had enough capacity free. Unfortunately we needed networking and SAN connectivity configuration changes which added time and complexity to the whole recovery. Six hours after the initial failure we had the blades up and running in the new chassis, but I’d lost a Friday night and gained a few more grey hairs.
How simple could this have been? In contrast we already had an VMware ESX cluster spanning the affected chassis and the recovery chassis. Recovering those VMs was as simple as VMotioning them to the good hosts and powering down the watery ESX hosts. About ten mins would have done it. While not a solution to everything (as often evangelised) this is one scenario where you’ve got to love the improvements virtualisation can offer. Simples!
- shared storage
- Common networks
- Ideally similar (or identical) hardware for each host
A good way to check that all hosts have access to the same networks and datastores is to use the ‘Maps’ feature. Select your cluster then deselect every option except ‘Host to Network’ or ‘Host to Datastore;
Maps help determine cluster validity
As you can see in this diagram the ’15 VLAN’ portgroup is not presented to every host (it’s slightly removed from the circle) and at least one VM in the cluster has a network assigned (in the top right) which isn’t available in this cluster at all.
Clusters consist of up to 32 hosts. The first five hosts in a cluster will be primaries, the rest secondaries. You can’t set a host to primary or secondary using the VI client, but you can using the AAM CLI (not supported, see how in this Yellow bricks article). One of the primaries will be the ‘active primary’ which collates resource information and places VMs after a failover event.
Heartbeat options and dependencies
Heartbeats are used to determine whether a host is still operational
Heartbeats use the service console networks by default, or the management network for ESXi hosts.
They’re sent every second by default. Can be amended using das.failuredetectioninterval
Primaries send heartbeats to both other primaries and secondaries, secondaries only send to primaries.
After no heartbeats have been received for 13 seconds the host will ping its isolation address.
HA operates even when vCentre is down (the AAM agent talks directly from host to host), although vCentre is required when first enabling HA on a cluster.
Diagnosing issues with heartbeats – see VMware KB1010991