I like Nimble‘s philosophy on support. They basically took the management and monitoring paradigm from corporate server infrastructures and applied it to storage: made them all connected with a phone home type of functionality and developed a mature set of tools for data analytics. They took people out of the equation where possible and putting them where humans are needed: customer contact.

Screenshot 2013-12-09 21.43.06

Humans are no longer needed for gathering and sending logs, detailing configuration to support engineers and other cumbersome tasks that need to be done for each-and-every support case. Humans are left to do the tricky part: communicate with other humans.

InfoSight is the resulting product of this philosophy. It’s a cloud-based monitoring application and monitors about 20,000 sensors (or metrics) in any given array to monitoring, basic health on software processes, hardware (temperature etc.), controller high availability, replication (incl. any errors). These heartbeats containing basic metrics are sent back every 5 minutes, and more comprehensive telemetry data is rolled up and sent back daily, resulting in about 30 million data points per array per day.

infosight-diagram2
These data points are fed into the data analytics engine on steroids for analysis and pro-active wellness monitoring. Customers can tap into this data using the InfoSight web portal, although direct access to the database is not provided.

Funny thing about InfoSight: you’d expect enterprises to be very careful about enabling this kind of massive ‘phone-home’ functionality. Nimble’s actually seen a 92% (of the total install base) InfoSight enablement rate with 82% enabling it within 60 days after shipment. To work with the 8% that doesn’t enable InfoSight, Nimble is looking at a on-premises appliance of InfoSight to overcome the security implications.
To add another little tidbit that amazed me: about 61% of the customer arrays upgrade the array’s software during the day. It’s harder to tell how fast after the update release these arrays are being upgraded, because Nimble always takes a ‘slow release’ approach.

The nuts ‘n’ bolts of Infosight

Let’s dive into InfoSight a bit before we steam on to the vOpenData bit. There are three core parts of InfoSight:

  1. The Engine
  2. The Portal
  3. Proactive Wellness

The Engine

The InfoSight Engine collects and analyzes data through powerful statistical analytics, system modeling capabilities, and predictive algorithms.

data-sciences-320

  • Systems Modeling: Workload data feeds detailed systems models to generate recommendations on how to improve performance;
  • Predictive Algorithms: Historical trends are analyzed intelligently to extract organic data growth rates and predict capacity needs;
  • Statistical Analysis: Continuous pattern matching and event correlation find issues proactively, and trigger alerts to maintain storage health.

There’s enough data points to go around, with more than 30 million data points collected from each array each day. With over 4000 arrays deployed, that’s a lot of data! Actually, about 1.21 Giga Watts 120 Giga (120^9) data points per day are stored in the Vertica database (which, of course, runs on Nimble Storage arrays) and supposedly this dataset is about 250 TiB in size before compression. Wow!

The Portal

The portal brings the different worlds of data together into a single view for storage administrators. It’s where the monitoring data your systems trickles in, where you’d do your reporting, forecasting and planning (do I need to scale up in a specific dimension or scale out?). Also, it’s a single point of contact to get in touch with Nimble’s support team.

Proactive Wellness

The Engine crunches the numbers and produces a ton of actionable information called Proactive Wellness in order to do advanced stuff like configuration management (could the configuration be optimized?), Break/Fix optimization (should a change or software upgrade be implemented?), resource planning (should any hardware component be upgraded based on the actual usage of that component?), maintaining optimal storage performance (including recommendations to optimize configuration for changing workloads), projecting storage capacity needs now and in the future (with predictive analysis and forecasting) and proactively monitoring storage health (space reclamation, volume protection status, RPO compliance, MPIO misconfiguration warnings and multi-initiator access to a non-clustered volume).

A lot of these situations will automatically create a case with the Nimble support team and inform the customer about the steps that need to be taken (by referring to a KB-article, for instance). After InfoSight has seen the change, the case is automatically closed.

Basically, it’s where storage-related problems come to die and where the support team really stands out. From what I understand, this is the most human part of Nimble’s support operation, with a real team doing very specific and constant analysis of the database and contacting customers that stand out in any way possible, good or bad.

How does InfoSight add real value for the customer?

During their Storage Field Day 4 presentation, they presented an example of their Proactive Wellness approach with a very specific memory leak at one customer, which the support team correlated to a specific process in the array. Engineering found and fixed the bug, and the support team alerted other customers that were likely to run into this same problem. Watch this case here (from 8:45 to about 12:00):

How does InfoSight compare to vOpenData?

vOpenData is an anonymous and open community database capturing virtual infrastructure configurations and tracks real-world usage metrics. In other words, it is the answer to the question

What is the average VMDK size for deployed virtual machines?

The install base is actually pretty big: 1,500 clusters with over 13,000 hosts in total, giving a pretty good insight into the average VMware-environment. I often use this incredible tool to do fictitious sizings in tenders or other projects when no other usage data is available.

The dashboard displays metrics like average number of datastores per cluster, average number of hosts per cluster, etc. The storage stuff is really interesting to me, like the average size of a VMDK or number of datastores per host.

Data sciences

If you have a half an hour to spare, watch this video; Larry Lancaster of Nimble Storage, who has the coolest job title by the way, dives deep into how the InfoSight engine works and how they do data analysis:

What struck me while listening to Larry was the immense amount of storage meta data they’re sitting on, and how they might have the most reliable insight into real-world storage usage yet. It’s a shame that it’s largely untapped, at least not if you’re not a Nimble customer or partner.

For instance: do you know what the actual working set of your VDI farm is? Larry and his analytics team figured this out and where able to very easily, due to the InfoSight database.

Screenshot 2013-12-09 21.44.49

Turns out, the average  working set size in a typical VDI workload is less than 6% (60 GiB) for every compressed TiB (which amounts to 1,5 – 3 TiB of user visible data) stored on a Nimble Storage array. This number goes down to less than 3% in larger deployments.
This same trend can be seen for the typical SQL Server deployment: for every compressed TiB used, less than 9% is actively used. In larger deployments, this number goes down to less than 3%.

Open it up!

Now, I would like to be able to access those kind of insights for my own projects. I think Nimble should release more of these factoids on an open part of the InfoSight Portal for everyone to read and have a kind of registered (and/or paid) access for developers to be able to tap into the raw data itself. This a true gold mine of information best served to the community at large. Isn’t the customer base, a community in itself, not responsible for generating all that data in the first place?

I’m curious if Nimble has ever contemplated opening up InfoSight or do something along the lines of vOpenData. I encourage Nimble Storage employees or anyone who knows to post in the comments below!

Opensourcing the factoids (and doing structural analysis to uncover them) isn’t going to get in the way of the commercial success of InfoSight: Proactive Wellness still relies on data analytics and the human interaction part, and InfoSight as a whole is still an incredible strong selling point in my book. Trusting your support approach while opening up the data underneath might be one of the strongest moves Nimble can make with the direction of InfoSight!

Concluding

I think InfoSight is one of the most innovative tools we’ve seen coming out of the storage industry in years. Nimble’s ability to build their successful Proactive Wellness support approach on top of InfoSight really sets them apart from other vendors.

Nimble is sitting on a giant amount of raw storage meta data, and they should open it up to the virtualization and storage community as a free service for everyone to enjoy.