Dartium Experience (pt5)


Turbulent Waters 

With great power, and feature sets, comes great responsibility. The Wizards at Datrium created a very awesome product with some very strict tolerances. They made an easy to use product that incorporates primary and backup storage with replication and all in an integrated single pain of glass.

After several system updates and upgrades it was found that keeping the easy to use ship on course, took us thru some pretty heavy seas. We were promised that the redundancy built into the system would keep our data online all the time. We found that wasn't always the case. And we found that the reasons for each incident ( there were a few ) were either a mismatched driver or firmware revision.

From the shock and awe of the deployment and management experience, the performance and recovery functionality to the SSD "nonfailure" failures to actual controller failures and unsupported supported drivers, the seven Cs of the Datrium Experience have been a very Turbulent place.

Cool... The tech is awesome!
Without rewriting the previous posts the following features are core to the solution. Its a software defined solution that allows for separation of performance and capacity. The layered approach to processing allows for encryption at rest, compression, deduplication, recovery, and resiliency in a fairly compact package.

Collected... It's easy to use
Simple installation, deployment and even expansion. Single pane of glass management and incorporated backups. Not to mention the super compact physical dimensions of our completed system.

The storage nodes can be added to the array in a matter of moments. Rack, stack, and power it on, go to the management console and add the node to the array. Backups are included and it's supper easy to do a restore. 

Calm... It's fairly reliable
The system has some decent internal monitoring and it calls home to support. It did notify tech support of some issues before they became a downtime problem. With the current release (5.3.1.0) they included duplicate IP monitoring among several others.

Confounded... except when it's not
Unexpected issues took weeks or months to become a problem and even longer to diagnose.

The SSDs are supposed to have a few features. Firstly they are supposed to be sharable between hosts on a failure. However we found that most of the time the SSD failure resulted in an All Paths Down that killed all our VMs on that host. Recovery is simple and we've had way to much practice.

The duplicate IP monitoring for instance has been complaining about all the ESXi hosts having a duplicate, but when the network team investigated all they found was the MAC address for VLAN definitions that have been shutdown.

Contemptible... frustratingly tight tolerances
I have no ill will toward Datrium tech support, those guys know ESXi, they know the storage system and a lot of the in between. They do have the same problems as every other solution provider, tier one support has to wait on higher tiers for help. 

However the when the system had issues and they became problems the final determination took a while to find and was usually a bad driver or a driver/firmware mismatch. The tolerances on the system are very narrow.

Constrained... can't reuse the gear
Purchased in good faith the solution would be usable for at least 5 years, hopefully 7, we bought re-branded OEM Datrium servers and storage nodes. Their support, like I said is great, however the legalities of re-branded gear make it nealy impossible to get any first party support from the original vendor of the gear, in this case Dell.

Working with other customers on this same solution who only bought the storage and equivalent Dell hosts, at the time was a little silly. Looking back I'm sure they will not have a big a budget issue as my team.

Canned... VMware stuffed 'em 
We knew that the technology being developed by this startup was too good not to be bought up by a bigger company looking to fill out their hyper-converged catalog. So the fact that the purchase was a gamble was always on our mind. But we knew the tech was tooooo good to get shelved.

Or so we thought. When VMware bought them out we had high hopes. Word came from the industry leader that they might be keeping it. The hope was dashed in a mater of weeks, they killed the DVX solution.

With the death of the solution we are scrambling to fill the void. We are not at liberty to run a primary server and storage environment without primary support. I know of several organizations that can get away with that model, but I think they are dumb! And that cavalier attitude will be the downfall of their solutions.

That's right I said it... DUMB! It doesn't get used enough to describe when people do dumb things! Like killing the best fit solution to local government servers and storage. I think that VMware killing the DVX solution is dumb.

Even with all the ups and downs, the solution is a great fit. I  believe with a few more serious years of development it would be the best solution on the market for disaggregated hyper-converged. I've made it about halfway through my story and at this point you might be wondering why the spoilers? Why do a summary in the middle? I'm not really sure, this is just what came out.


There will be more to this story, but i have to warn you... It will not be as happy as it has been.


No comments:

Post a Comment

Top Tech Talk

Datrium Experience (fin)

This series has discussed the performance, the ease of use, the resiliency, the flaws and the finale of the Datrium DVX platform. From here ...