Remember how I promised to get back to you about a certain Super Secret Awesomesauce Company?

In particular, the Super Secret Awesomesauce Company has me so excited that I want to shout it off the rooftops, but I’m afraid I can’t, at least not just yet. Check back on the 19th of August, and you might be in for a very cool surprise. Joep Piscaer, ‘Tech Field Day Extra (#EVMWU14) at VMworld 2014: check!

Well, here goes.

Introduction to DataGravity’s X-ray vision

Screen Shot 2014-08-19 at 22.13.24 This week, DataGravity unveiled their product offering after over two years of development. I was one of the bloggers to get a preview of the solution earlier this week in preparation of the TechFieldDay Extra at VMworld 2014 event hosted by Stephen Foskett et al. Instead of writing the introduction to DataGravity’s Discovery product all over again, I’ll quote Stephen:

(…) DataGravity just introduced a pretty ordinary storage array aimed at the fat middle of the datacenter market. Sure, they call it “state-of-the-art primary storage” and it ticks the current checkboxes (unified access, flash-optimized, hybrid architecture, in-line data reduction) (…) Stephen Foskett, ‘Why is DataGravity Such a Big Deal?

So basically, it’s an active/passive array, where the passive node will perform all kinds of analytics magic on the data that’s stored on that passive node for redundancy reasons anyway. What kind of analytics magic, you ask? Well, simply put, the array knows everything about the data stored in the array. In other words, these guys truly have the first enterprise storage array with X-ray vision.

DataGravity is violating the traditional enterprise storage firewall and actually looking into the data being stored. The array actively monitors reads and writes, maintaining continual snapshots of the system over time. Stephen Foskett, ‘Why is DataGravity Such a Big Deal?

Star Trek's DataNo, this is not Stephen Foskett.

It knows all file attributes such as date modified, date last accesseddate createdowner and size. They even know file types (currently about 400 different file types), and because of that, they can actually look inside the files and are able to interact with that data. They can also detect specific data patterns such as credit card data. Yes, you read that correctly: the storage array has the ability to look inside files stored on the array and is able to recognize all kinds of data patterns. In the demo, functionality was rather limited, but included the ability to search for specific file types, specific data patterns and keywords and filter results using different file attributes.

The DataGravity system will identify LUNs, crack open logical volumes, access filesystems, and even read file content. Stephen Foskett, ‘Why is DataGravity Such a Big Deal?

The best part, however, is that they are able to do so, regardless of the underlying storage protocol. In other words, they are able to extract all this file system meta data even from block-based storage protocols such as iSCSI. This is their core enabling technology and most valuable asset right now, and is what makes them pretty special in the mid-market enterprise storage market today.

Search and Discovery

X-ray vision, or not?X-ray vision, or not?

Their current go-to-market play with this technology is in the eDiscovery field by creating a full-text search index of all the content stored on the array. Administrators and business users alike (well, mostly HR and legal, anyway) are able to do some pretty cool and creative stuff with this vast array of both metadata (file attributes and a full text index on the passive array node) and the data (i.e. the file contents) itself. This creates a very broad spectrum of use cases, from security (who had access to what files and folders, and who actually read and wrote information there?, including real-time, file-level user activity tracking) to governance (was anyone not cleared for accessing credit card information actually able to?, and recognition of sensitive content patterns), business intelligence (search and discovery for unstructured data correlation across people, time, activities and content) to IT Operations (restoring specific versions of lost or corrupted data inside file systems and inside virtual machines). The possibilities are endless, especially with the product evolving to support custom file types and other customer-adaptable nobs to turn.

My take on their core technology

The last example on IT Operations got me thinking, though. If they are aware of everything that is going on inside the array, why not leverage it the other way around? Use the fact that you understand virtual machines, file systems and file types and inject the right type of data on-demand and intelligently? I’m talking about actually putting stuff (back) into the filesystem as their biggest asset.

Funny enough, this had me thinking back to a (sadly very embarrassing) presentation on Dinamiqs Virtual Storm during one of the Dutch VMUG meetings back in 2009, which used application layering and data injection (by leveraging an early version of Symantec Workspace Virtualization) to do some pretty wild stuff for desktop virtualization. I won’t go into details as to why the presentation was very embarrassing indeed, but instead save that story to tell over a good beer or two at VMworld or VeeamON ;-). Fact is, that product had some very interesting layering and injection concepts, and I thought of those concepts after seeing the DataGravity briefing.

Let me give you an example of how I think it is very, very important for DataGravity to have the ability to intelligently and dynamically put stuff back into the file system without moving actual data around. Now, the following is completely fictional: I have no insight into the product roadmap, and was not briefed in any way about this. This is a product of my imagination alone, and might never see the light of day in the actual product.

A though experiment

Imagine how this would work for application and desktop delivery in an SBC or VDI-environment. These types of environments have large amounts of fairly static data: virtualized application packages and their local cache, VDI master images, SBC-based Citrix servers, mandatory profiles, standard applications installed inside the virtual desktops (like Microsoft Office), and the list goes on.

Instead of relying on application-level optimizations to deal with all the clones of these data instances, let the array handle it. So now, there’s no need for Citrix provisioning tools like Provisioning Services or Machine Creation Services, VMware tooling like VCAI, cache disks for application packages, and much more, because the array will recognize, for example, an App-V package, and automatically add it to a ‘Application Package Cache’ container, which will be presented to all VDI-desktops. Another example is a VDI Master Image: the array will understand this file type, and present it into designated virtual machines which will use it to boot from without the need for complex provisioning services like Citrix PVS or MCS.

The array would obviously have to integrate with the popular products in this space (Citrix XenDesktop, VMware View and ThinApp, Microsoft App-V) to be able to recognize file types and read data inside the files, so the array knows what’s what. With this integration, the array is able to provision the right resource (desktop, application, etc.) to the user (and intermediary IT infrastructure components) without any significant impact on the storage performance, and with very little capacity overhead. You’d still need those applications for orchestration of events, but the array would do all the heavy lifting. Think of this as a application-level VAAI-like framework.

It’s like having a storage container for every App-V / ThinApp application package and for every VDI master image. The container would be transparently mounted to many end-points (virtual desktops or and Citrix servers) at the same time on the VM or Guest OS filesystem level. The VM or Guest OS would never be aware of this magic underneath.

I haven’t thought out this concept entirely, so it’s probably full of loopholes, faulty assumptions and glaring mistakes; but again, this is just an experiment to see how far DataGravity’s technology could go!

Conclusion

The point is, I think this technology is very powerful, and can have many use cases besides the current Search and Discovery feature set. What I described is just one of the potential uses I see and I believe this technology has a lot more potential. DataGravity is surely only just scratching the surface with their first release, and I’m sure Paula Long and her team have some pretty neat tricks up their sleeve, and I, for one, am very curious what they will come up with next. I can’t imagine they’ll not develop this X-ray vision to integrate further and further with other parts of the IT infrastructure, especially components that traditionally require a lot of storage capacity and can be heavily optimized for capacity usage, such as applications, user data and VDI-environments.

I’m excited to see  what they will present on at the upcoming TechFieldDay Extra at VMworld 2014. I won’t be able to attend their presentation unfortunately, but I will save the videos to watch on my long flight home after the event. I highly recommend you tune into the live streaming of DataGravity and other presenting companies’ presentations.

Come chat with us on IRC

If you want to have a more in-depth discussion, everyone is welcome to join in on IRC (irc://irc.klauwd.com/evmwu14), where Robbert and I, and maybe others, will hang around during the event.