I’d like to warn you for a not-so-obviously documented fluke in the combination of a HP LeftHand (P4000) Multi-Site Cluster and VMware vSphere hosts on multiple sites.

The situation

HP LeftHand (P4000) Multi-Site Cluster

HP LeftHand Multi-Site Cluster requires the storage nodes to be divided into two subnets, one per site. Each subnet (or VLAN) requires a Virtual IP in that subnet.

Example

I.e.: VLAN 1 with Virtual IP 10.10.1.100 (/24) for site 1 which hold storage node 1 and storage 3 and VLAN 2 with Virtual IP 10.10.2.100 (/24) for site 2 which hold storage node 2 and storage node 4.

LUN Ownership

Whenever a LUN (or volume, in Lefthand’s CMC) on the LeftHand cluster is accessed by an iSCSI Initiator, that volume is bound to the Virtual IP (and thus subnet / VLAN) by which the LUN is accessed.

Example

I.e.: when an ESX-host on Site 1 rescans it’s vmhba38, the volumes presented to that ESX-server get bound to this site’s Virtual IP, which is in VLAN 1.

VMware ESX software iSCSI Initiator and multiple subnets

The VMware software iSCSI Initiator (and, by extension, the dependent hardware iSCSI Initiator) do not support accessing a iSCSI Target (or LeftHand Virtual IP) outside it’s own subnet.

Example

I.e.: the ESX-hosts on Site 2, which have their iSCSI adapter in VLAN 2, cannot traverse the network to access the LUN that is bound to VLAN1. They to a discovery to the Virtual IP in VLAN 2, but get redirected to the Virtual IP in VLAN 1, because that’s where the LUNs are bound to. You’ll see errors in  /var/log/messages about iscsid not being able to connect to the Virtual IP in VLAN 1.

The solution

Multiple VMkernel ports for iSCSI in multiple VLANs

The only way to maintain synchronous replication (‘Network RAID 10’ in LeftHand naming convention) is to maintain the HP LeftHand Multi-Site Cluster. This means maintaining the multiple VLANs for iSCSI. To enable hosts on site 2 to access the LUNs, you’ll need to configure additional VMkernel ports for iSCSI in the other VLAN. So in addition to the two VMkernel ports in VLAN 2 on an ESX-host in site 2, you’ll add two VMkernel ports in VLAN 1 on that ESX-host. On the ESX-hosts in site 1, you’ll add two VMkernel ports in VLAN 2.

The impact of this change is additional iSCSI traffic over the intersite link: not only does the replication traffic travel between sites, now also iSCSI traffic from the ESX-hosts in site 2 travels the intersite link to the storage nodes in site 1.

Another change in this environment is the added complexity to complete a site failover. You’ll need to rescan all software iSCSI adapters after the primary site has failed, because the LUNs are still attached to the (now failed) Virtual IP of the primary VLAN. By rescanning, the LUNs are attached to the Virtual IP of the other VLAN. If site 2 fails, only the replication is lost, as site 2 doesn’t do any iSCSI traffic with any of the ESX-hosts but merely does Network RAID 10 replication.

If you dare create a single vSphere Cluster, spanning HA/DRS across sites, VMware HA can take care of VM failover. If hosts on site 1 fail but storage is still alive, a rescan on the hosts on site 2 is needed before HA can restart the VM’s. You’d better be quick with that rescan! If hosts on site 2 fail, no rescan or other action is needed, as HA will restart VMs in the ESX-hosts in site 1 (because ESX-hosts in site 1 have an active session with the LUNs).