Datrium Experience (pt3)

Part 3 - Horse Power and Torque

The way in which Datrium disaggregated the solution is what made it so appealing. The software runs on the host and the storage was completely separate. Horse power is applied closest to the VMs where you want it and the torque is across the wire where the heavy lifting is happening. Top it off with a little software defined ignition system that can go from fast to insane and you have a solution that was right up our aisle.

Hyperdrive...r

The storage controller software is broken up into 2 parts. Frontend and backend. The front end was on the host in the form of a VIB that acts like a driver. This driver serves several functions.

The first is to connect the host to the backend and present the storage to the host. This is in the form of NFS. It's a local share. This is presumably to make use of the local flash.

The connection to the backend was some proprietary concoction that used ethernet, 25Gb in our case. Another critical function was storage efficiency. Before data is written across the wire, it's compressed, deduplicated, optionally encrypted and written to the flash.

Supposedly another key function is that hosts are able to share their local flash with fellow hosts in the event that host has an issue with its local flash. This has been a serious point of failure in our environment with no clear reason or solution in sight.

This is not the point I slam their support. Those guys know their product and quick to respond. The team is good. This is the point that I slam the engineering team for making a system that does not fail over properly to neighboring host flash when local flash fails. More on this later.

You have the ability to alter performance when resources get tight. A nifty little switch on the host DVX console. One side says Fast the other Insane. This switch let's you select how much compute resources you allow the hyperdriver to consume. Even on Insane we have not noticed any negative impact on host resource utilization.

The last feature I'm going to mention is replication. When the system is configured for snapshots and replication the host portion of the control software handles the replication of data. And according to the sales guys, all the data transmitted is encrypted and all the storage efficiencies are maintained.

Payload

When we think of torque we are usually thinking in terms of how much can I pull. The DVX storage nodes definitely pull their weight. The second half of the controller software has a bunch of little goodies.

Along with collecting and distributing all the data written from the hosts to all the DVX nodes in the cluster, it maintains storage efficiencies and the doubles down on them. When a host writes something it's compressed and deduced on the host then passed to the DVX node, here it is written with the data from all the other hosts in the cluster. There is potentially a lot of the same stuff that can be compressed and deduped all over again to save even more capacity.

The storage recovery process, or SR runs every 4 hours. This is what I was told. In actuality, on larger capacity systems it runs until it finishes and then starts again on the next 4 hour rotation.

We have seen this process take over a day and half to finish on a data set the normally takes less than 18 hours. Not the end of the world if you have enough spare capacity to allow the process to compete. Our systems can write over 1TB an hour in new data that eventually becomes little more than 10-50GB of actual used capacity after this process runs.

More goodies include encryption at rest, 10TB drives, erasure coding 2, dual hot-swap controllers and redundant power supplies. Encryption at rest was a big selling point for us so, when they said they can do it without an impact on performance, it obviously put these guys higher in the points then some of the competition at the time.

Most of the goodies, features and design aspects of the solution seem a lot more like magic then technology, but more on that next time.

Top Tech Talk

Datrium Experience (fin)

This series has discussed the performance, the ease of use, the resiliency, the flaws and the finale of the Datrium DVX platform. From here ...