In a previous post called Running a Dell PERC with high latency? Check the LSI SMI-S vib!, I mentioned that having the LSI SMI-S provider installed on a host causes significant latency issues on an ESXi-host.

I’ve been having a lot of latency issues lately with two Dell PowerEdge R310s. These 1U boxes have a low end controller, an PERC H200. I’ve been having latency spikes in the range of 500-600ms, which is high enough to have the Linux VMs remount their filesystems in read-only mode continuously. This basically happens any time any of the VMs does moderate (say, 25+) iops, and causes the controller to lock up and take down multiple other VMs along the way. It also happens during any operation on the controller itself, like formatting a disk with VMFS, creating a snapshot, consolidating a disk or removing a snapshot.

The work-around I discovered was simple:

I removed the vib (‘lsiprovider’) and rebooted the host. And hey presto, I could easily push the SSD and H200 controller north of 4.000 iops with sub 10ms latency without any issue, which is pretty good in my view, and it certainly is a substantial improvement from the latency spikes and horribly low iops before. After a couple of hours of testing and monitoring, the previously mentioned issues seem to have completely disappeared by removing the SMI-S provider.

I have asked LSI and some guys inside VMware if they have any more information on this, but it’s hard to uncover any more information. LSI Support did get back to me, stating:

According to LSI Engineering department, this latency is caused by a bug in the hypervisor. The bug should be fixed in vSphere 5.1 Update 3 and 5.5 Update 2.

It seems this issue will be fixed in an upcoming release of vSphere, so I guess we need to use the work-around until then and hope the fix will actually make the 5.5 Update 2 release. I’m wondering if this issue is LSI-specific, or a more bug more widely affecting other SMI-S providers, too.