In a previous post called Running a Dell PERC with high latency? Check the LSI SMI-S vib!, I mentioned that having the LSI SMI-S provider installed on a host causes significant latency issues on an ESXi-host.
I’ve been having a lot of latency issues lately with two Dell PowerEdge R310s. These 1U boxes have a low end controller, an PERC H200. I’ve been having latency spikes in the range of 500-600ms, which is high enough to have the Linux VMs remount their filesystems in read-only mode continuously. This basically happens any time any of the VMs does moderate (say, 25+) iops, and causes the controller to lock up and take down multiple other VMs along the way. It also happens during any operation on the controller itself, like formatting a disk with VMFS, creating a snapshot, consolidating a disk or removing a snapshot.
The work-around I discovered was simple:
I removed the vib (‘lsiprovider’) and rebooted the host. And hey presto, I could easily push the SSD and H200 controller north of 4.000 iops with sub 10ms latency without any issue, which is pretty good in my view, and it certainly is a substantial improvement from the latency spikes and horribly low iops before. After a couple of hours of testing and monitoring, the previously mentioned issues seem to have completely disappeared by removing the SMI-S provider.
I have asked LSI and some guys inside VMware if they have any more information on this, but it’s hard to uncover any more information. LSI Support did get back to me, stating:
According to LSI Engineering department, this latency is caused by a bug in the hypervisor. The bug should be fixed in vSphere 5.1 Update 3 and 5.5 Update 2.
It seems this issue will be fixed in an upcoming release of vSphere, so I guess we need to use the work-around until then and hope the fix will actually make the 5.5 Update 2 release. I’m wondering if this issue is LSI-specific, or a more bug more widely affecting other SMI-S providers, too.
I’m having exactly the same issues. My ESXI hosts are updated to update 2, according to my update manager but when I install the latest lsi provider I still get terrible disk performance. Do you know if there is any update to this issue?
This should be fixed in ESXi5.5 Update 2:
“ESXi host might experience high I/O latency
When large CIM requests are sent to the LSI SMI-S provider on an ESXi host, high I/O latency might occur on the ESXi host due to poor storage.”
Not sure why you are still experiencing this issues. Are you running the latest version of the LSI SMI-S provider?
Stumbled across this post by accident while looking for something very different.
But as we use some vSphere 5.5 hosts with LSI’s “IR” adapters, and I know that they perform very poorly IO wise, I gave it a shot.
ESXi 5.5 build 2302651 (most recent as of today) is now 5-8x faster after removing the LSI provider. (installed was the most recent version available)
Hey Chris,
So basically, this bug persists even in the most recent build of ESXi? Have you tried updating the LSI provider to see if that helps at all?
I found a similar issue this week. I’m running esxi 6.0 update 1. Performance “as expected” under the poor h200. Running between 2-12 ms latency on the data store during OS boots, etc., but system is otherwise lightly used. Wanted storage health to show up in esxi. Installed the lsiprovider 500.04.V0.58-0006. I started a VM and thought an update had gone bad. After loading screen, system was sitting on black screen forever- 5-7 minutes before showing login screen. Found your post. Checked latency- averaging 150-200 ms. Hard drive lights had a very distinct “interrupted” pattern. Shutdown VMs, removed smis, back to normal.