I have been testing Infinio in my lab extensively. Read on for my notes on installation, availability and complexity of the Infinio Accelerator v1.1; there’s a lot to uncover!

What is Infinio?

Let’s start by looking at what the Infinio solution is, exactly:

Good, you’re done watching the marketing intro. Let’s move on to what my co-bloggers wrote:

Each appliance creates a bridge network, builds a vSwitch, and leverages proxy ARP. The storage array ends up thinking that it is talking to the ESXi host, and the host ends up thinking it is talking to the storage array.

Today, the solution is focused on write-through performance acceleration, which ultimately provide an offload of reads from the array, which both boosts read performance and reduces read latency.

Really interesting also the deploying approach: only 1 management IP is needed for the entire solution, each VA (one per host) will use auto-generated IP (in the APIPA range) on a dedicated VLAN (basically it build it on the vMotion network to be use to have an isolated and suitable network for connecting all the hosts). To configure the VA as a real “NFS gateway” without interruption the VMkernel NFS interface is moved on a new VLAN (usually the last one available, so the 4094) where the VA is receiving the request and then pass to the real NFS target.

I only tested Infinio in my home lab and only used a single host for the test. I accelerated an NFS datastore on an old Iomega IX2 NAS device. The IX2 is a two drive unit with SATA drives, so the performance is pretty crappy. The results were pretty impressive, it helped the read performance a great deal which allowed the write performance to marginally improve also.”

My good friend Arjan has posted an excellent product installation walk-through in two parts.

My lab setup

The lab I used for setting up and testing Infinio is completely virtualized using VMware Workstation. Three linked clones run ESXi 5.1, each with a 40GB disk added for local storage and have 12GB of RAM. Each has two VMnics: one bridged adapter to my physical network (10.10.10.0/24) for vSwitch0 and the other is a host-only adapter to a virtual network (10.10.20.0/24) for vSwitch1. This is reflected in the vSwitch setup: vSwitch0 is kept default and is used for Management and VM traffic. vSwitch1 is used for vMotion and NFS traffic. vMotion is configured to use a separate VMkernel adapter on vSwitch1 in a separate non-routable subnet (10.10.30.0/24). None of the three networks use a VLAN ID.

A FreeNAS VM (connected to the host-only network) provides NFS storage (ds1 and ds2 datastores).
Finally, I have a virtualized vCenter Server for central management. I’ve set up an HA- and DRS-enabled cluster and remediated all warnings and errors the cluster threw at me (non-redundant management network, insufficient heartbeat datastores, system log persistency, etc.). During testing, I changed various parameters (enabled/disabled vMotion, changed VLAN IDs, removed hosts from the cluster, etc.).

Infinio installation requirements

With this very simple lab setup, the Infinio installation requirements for VMkernel networking are met (other requirements via the link):

  • Must be a standard switch. Distributed virtual switches (vDS) are not supported.
  • Must be dedicated to storage traffic. It must not be used by vCenter for management connectivity to the ESXi host (the Management option is not check on VMkernel port configuration).
  • Must not have port binding enabled.
  • Must be assigned a static IP address.

Even more requirements?

— Update Jan 1st, 2014
The portal I mention below isn’t partner-only, but is accessible for all customers. My access was provisioned in a different way, which is why I had mistaken the portal for partner-only access. Thanks to @VMCarrie and @MJBrender for pointing this out!
— Update

These requirements are not all, though. A private (partner-only) document paints a different picture and give a bit more detail on the full list of requirements. I’d like to see that Infinio will publish these documents publicly. If you do have access, be sure to read up on the partner portal before installing Infinio to get up to speed on some more detailed information on requirements, constraints and the impact on your environment.

  1. Infinio Accelerator Requirements
  2. Infinio Accelerator Impact
  3. Installation and Deployment Troubleshooting
  4. Release Notes for all versions, including the v1.1 release
  5. Manually Uninstalling Accelerator VMs; contains good info on what happens under the covers

I’ve included some interesting quotes to clarify:

vMotion network requirements

For multi-host ESXi/ESX clusters, Infinio evaluates your network configuration, searching for a VMkernel port that is enabled for vMotion. When it finds one, Infinio Accelerator creates a VM port group that mirrors the VLAN ID from the selected VMkernel port. This change allows Infinio to communicate with peers using the same VLAN ID as vMotion. Peers communicate using link local IP addressing. The following graphic demonstrates typical peer-to-peer communications.
rtaImage (1)

Each host in a cluster is required to have vMotion enabled on a (separate) VMkernel adapter. In my testing, a VLAN ID was not required, even though the documentation states it clearly; I didn’t use a VLAN ID at all for vMotion. For the fun of it, I disabled vMotion completely on all hosts in a cluster, and I got an error message during the pre-flight check: Unfortunately, the hosts within this cluster have a network topology that is incompatible with Infinio. Each host is required to share a VLAN ID across vMotion Port Groups. This confirms that all hosts in a cluster are required to have vMotion enabled!

In addition, it is good to note that although Infinio actually requires vCenter, hosts do not need to be in an HA/DRS cluster. For these standalone hosts, the vMotion requirement is dropped and the Port Group for Peer-to-Peer communication is mirrored from the Management Traffic VMkernel port.

Each host that is accelerated is required to share a VLAN ID across vMotion Port Groups within that cluster

Using the vMotion network for Peer-to-Peer communication is done according to VMware best practices for traffic separation and bandwidth requirement and their best bet to use a back-end network for Infinio-internal communication. As a support engineer put it:

As to vMotion and NFS being separate… besides being a VMware best practice, it really is a practical move.  If you vMotion a hundred vm’s on your storage network, your performance will be greatly impacted, maybe even signaling a storage offline event.  In a lab with no load, you don’t see these issues.  We communicate over the vMotion network because it is latent bandwidth and almost every production environment has this set up to take advantage of using virtual technology.  During a vMotion, Infinio doesn’t use this network, but otherwise, we use it to interleave all the cache nodes so they are de-duplicated against each other.

VLAN 4095

The storage network needs to be isolated on a separate VMkernel port (or multiple ports) and use a separate IP subnet. A VLAN ID for the storage network is not required. If you do use one, VLAN ID 4095 cannot be used, since the installation flow requires that VLAN ID. Why this VLAN ID cannot be used on the original VMkernel port; I’m not sure. According to the support engineer, there shouldn’t be any reason not to. I’ve tested this against v1.0.1, v1.0.2 and v1.1 and found that the installation fails consistently with error ERROR: FAILURE_ADDING_CACHE_VMK (Invalid vlan id 4095).

For each vSwitch with an accelerated VMkernel interface, Accelerator adds a pair of port groups – a public and a private port group. The public port receives the trunk VLAN ID (4095). The private port group receives a unique, unused VLAN ID and cannot send or receive network traffic from outside the host or over the physical network.

I contacted a support manager, and he clarified:

Ports 4095 and 4094 are the default ports that we try to grab for use in setting up the intercepting network (one for the private side and one for the public). If however that is in use on your system, then we should be working backwards to find a port that is not in use.

Separation of VMkernel Traffic

The Infinio installer can get a bit confused if you combine multiple traffic types on a single VMkernel port. The Management Traffic separation is actively checked for during installation (and is required explicitly; see quote below), but I found that combining vMotion and NFS traffic on the same VMkernel is allowed by the pre-flight check. Even though the check passed and the installation and configuration of the accelerator VMs succeeded, I strongly recommend that you separate these two traffic types onto different VMkernel ports (each on its own IP subnet and preferably on a different VLAN). The reason I tested with a VMkernel adapter with vMotion and NFS traffic combined is this quote:

Must be dedicated to storage traffic. It must not be used by vCenter for management connectivity to the ESXi/ESX host (the Management option is not checked on VMkernel port configuration). Must not have port binding enabled.

If you read carefully, it does rule out all other traffic type on the same VMkernel port, but only explicitly mentions Management Traffic and iSCSI Port Binding, not vMotion traffic. I think this requirement should be worded more explicitly to exclude vMotion and FT traffic, too, if there’s a reason to do so, even if it’s only to adhere to VMware best practices.

DNS requirements

Please make sure that the client you’re using to deploy Infinio can resolve DNS-records and reach all components in your infrastructure, such as  ESXi-hosts and the vCenter server. This is more of a vSphere requirement than a Infinio requirement, but it bit me a couple of times during installation. Hosts files configuration definitely won’t cut it; please set up central DNS records for this.

Random Thoughts

Availability of the Accelerator VM affects datastore and VM availability

The Accelerator VM provides its data services only to the host it’s running on. This means that if the host goes down, the Accelerator VM and all other VMs (running on top of Infinio) will, too. No problem there, business as usual and VMware HA picks up the broken pieces.

If an Accelerator VM on a single host fails, that host loses access to all NFS datastores that were in the storage path, even non-accelerated datastores. Adding new datastores to the host when the Accelerator VM is down is not possible either.

Without the Accelerator VM, there’s nothing to route traffic between the Private and the Public VM Port Groups created by Infinio, which means NFS traffic is interrupted and datastores go down. This is a huge single point of failure and the risk cannot mitigated by using VMware HA or FT since the Accelerator VM runs from the local datastore. VMware does not offer any mechanism to automatically fail over virtual machines that have lost access to their datastores to a different host (provided there are other hosts in the cluster with access to the datastores).

The only way to access the datastores again is to remove the Accelerator VM on the affected host via the Infinio Console, which restores the original network settings. The Infinio console does signal this situation automatically, and offers to remove the Accelerator VM. This action requires manual intervention though, which increases the time VMs are affected. Without any external notification like e-mail or snmp, you really need to have the Infinio console open at all times to spot this notification to be able to act quickly. The lack of good alerting and notifications make this part of the Infinio solution very weak, and it is something they need to improve on for me to consider the solution in any production environment.

Important to note is that the affected virtual machines are restarted when removing the Accelerator VM, so there is additional downtime and risk of loss of virtual machine data.

— Update Jan 1st, 2014
Please check Matthew’s comment below for a possible explanation of why VMs were restarted in this scenario. I will have to analyze the root cause to determine what really happened here.
— Update

The Infinio Accelerator VM accelerates each datastore by capturing traffic coming from the NFS storage VMK. The Accelerator VM also captures traffic to other datastores on the same VMK. The Accelerator VM simply forwards these datastores’ IO directly to the backend storage device. The performance impact to these datastores is negligible

The complexity is hidden, but it’s still there

Although Infinio does a heck of a job on their installer with pre-flight install and host checks, there’s still a lot of complexity to deal with. This means that Infinio will only fit in a subset of all vSphere-environments and has strict requirements that need to be followed to the letter. Imagine troubleshooting a storage or network problem with all of this complexity in place!

Hacking networking to fix storage is fundamentally wrong

Now maybe it’s just me, but bridging networks, adding Port Groups, trunking VLANs and fiddling with proxy ARP to reroute traffic seems like a very dirty way to fix storage performance. Especially because all storage traffic that originally went through the VMkernel adapter is rerouted, adding latency to all datastores. There’s no way to select and only touch traffic to a specific datastore.

[Insert Kernel Module vs. Virtual Appliance discussion here]

Finally, all this network magic is needed to run traffic through a virtual appliance, which is a globally deduplicated read cache. I still do not like the thought of storage traffic going back and forth between hypervisor and virtual machine any more than it has to. I take consolidation in the fact that it’s only a read (write-through) cache, so nothing should be lost when you lose the Accelerator. I do hope Infinio figure out a way to move all of the Accelerator VM bits into the hypervisor itself. For education’s sake, I would love to see which software components the Linux-based Accelerator VMs are made of.

Concluding

Let’s zoom out a couple of miles and forget about the network-and-appliance approach, the very specific list of requirements and the availability considerations.

Infinio has done a stellar job on the installation and configuration workflow. The installer is beautiful, easy to work with and gives just enough feedback so that I know what’s going on (or what needs to be changed/fixed in order to continue). The amount of seamless is just through the roof: even removal of the product is completely automated and without a single hitch. This feeling is strengthened by the lack of additional hardware that’s required: each hosts already has a bunch of RAM, and with hosts running 128GB or more, this resource is becoming less of a bottleneck.

It’s a very cost-effective way to accelerate workloads with as little resources as possible. Think about the use cases: generic Wintel, VDI, database or groupware clusters, it can even be used to extend the lifetime of existing NAS solutions that don’t perform as well as expected. It certainly confirms the trend of separating storage capacity and performance by moving performance to the server.

Given the availability considerations and the considerable amount of (hidden) complexity, I wouldn’t recommend this product blindly. I would really like to see Infinio adding an automatic remediation of the broken network configuration in case of an Accelerator failure, so that loss of access to datastores is minimized (at least for non-accelerated datastores) and workloads can continue running on the affected host with minimal (or no) downtime. In addition, external notification using smtp and/or snmp for discovered issues (such as a problem with an Accelerator VM) really is a prerequisite for any serious IT infrastructure.

Also, better documentation with more in-depth information about how Infinio uses and depends on the vMotion network, VLAN 4095 and separation of VMkernel traffic is greatly appreciated; I’ve had my share of problem in these areas due to lacking documentation; thankfully I was able to discover these bits quick enough and work with and/or around them.

Finally, Infinio’s virtual appliance-based approach has some serious drawbacks: the storage network is massively altered and introduces a huge SPOF in the storage path. I would like to see them going the VMkernel-based route like PernixData has done, which reduces the amount of network-trickery required, too.

I will keep my eye on Infinio, as I expect them to release new and exciting versions regularly. Maybe we will even see a hypervisor-agnostic version and write-back caching?