Pure Storage CEO on all-flash data centers and the cloud | Virtual Reality
Giancarlo was a managing director and senior advisor at Silver Lake Partners before joining Pure Storage. Prior to that, he held multiple executive positions at Cisco, where he helped steer the company into markets such as Ethernet switching, VoIP, Wi-Fi and telepresence.
Giancarlo talked with Network World’s Ann Bednarz about what Pure is doing to keep the storage industry moving forward, and how the experience he gained during Cisco’s growth spurt is helping.
He described Pure’s vision for a data-centric architecture – an approach that combines the simplicity of direct-attached storage with the scalability and reliability of network storage – and how it will lead to the eventual collapse of storage tiers.
Giancarlo also talked about the fate of magnetic disk drives (only for cold storage); why NVMe is important (enables even greater efficiency in flash); and what’s distinctive about the company’s pay-per-use Evergreen storage services (no rip-and-replace upgrades).
Here is an edited transcript of that conversation.
Enterprise storage has been stuck with the perception that it’s boring. Is that changing? Is the storage industry becoming more innovative?
There’s always a bottleneck to the progress in computation, and I think the bottleneck for the last decade, with the growth of data, has been how to handle all the data. Frankly, I think the technology has been behind. Now we’re finally starting to see some real advances in storage. That’s what makes it exciting. When it becomes a bottleneck, that also means there’s a lot of opportunity.
What stands out to you after your first year at Pure Storage? How did the company perform?
I think our performance speaks for itself. If you look over the last year, we’ve grown an average of 40% year over year. We’ve come out with some great new products that are growing very well. And we continue to lead the market in advancing new technologies. It speaks to the quality of the company overall. Of course, a lot of that was in place before I came on board. It’s a bit too early for me to talk about any real accomplishments. But I do think that what I saw here was a company that had great potential, that was transitioning from being a midsize company to a large company, and that needed to transition some of ways in which it did business. I think I’ve been able to help them start to advance to the next stage. That has to do with the way we work with our partners in field, the way we scale our sales force and our development organization, and the way we aspire to looking at new opportunities for the business.
In May, Pure Storage outlined its vision for a data-centric architecture that delivers on the need for agility and performance in enterprise settings. Can you explain the data-centric architecture? What does it involve from a technical standpoint?
I’ll go back a little bit, in terms of the way that customers design their environment, and I’ll talk about why we now have an opportunity to modify that, and why we should modify that.
If you think about what an ideal situation would be, if you could snap your fingers and make magic happen, you’d have one super powerful processor that could address storage that was located right next to it, at speed of light. That would be an ultimate, easy, very straightforward architecture. Now going back 10 or 15 years, the fastest connection that people had was 1 Gigabit Ethernet. They had disks that were maybe 1 terabyte at most, and we had distributed processors.
In order to handle the world’s largest computation problems, we still need lots of processors – that hasn’t changed. But other things have changed.
For one, networking speeds are at 100 Gigabit Ethernet and even moving to 400 Gigabit Ethernet. Data has continued to grow explosively, to the point where you can’t just fit it on one disk or SSD. So we need to scale that. But with the very high networking speed and with the density we’re able to get with solid-state storage now, we’re able to make it look as if all the data you want is right next to the processors.
Another thing that has changed is that many years ago, the application stack was very heavy. It was difficult to construct. It was customized to the specific application environment. You had an operating system, you had security software associated with that operating system, you had remote management software associated with that operating system, and then you had an application that was tuned to it. And it was stuck there. And now you had to get access to lots of data. And the only way to do that was to spread the data out, what was known as scale-out architecture for the data.
But today, applications are very lightweight. They’re virtualized and increasingly containerized. They can be placed anywhere. But the data itself is heavy. When you have petabytes of data – even in an array such as ours, which can fit a petabyte in about 5 inches on a rack – moving all that data would take a long time. It’s far better to move the application to the data than to move the data to the application. Now, with 100 Gigabit Ethernet interfaces, we can do that.
So that’s what we mean by data-centric architecture. It’s designing the architecture for your data processing around the data, rather than designing the data around the application.
The other thing that used to happen years ago, and even happens today, is that data was constantly replicated because every application wanted its own copy of the data. Part of the reason for that was performance. They didn’t want to have to share what was limited performance with other applications. Today with solid state data, we can have multiple applications access the data with the full performance they need with quality-of-service guarantees so it’s not affecting the other applications when it gets access to the data.
That’s another thing we mean by data-centric architecture. It’s reducing the number of copies of your data. Making it easier to get access to it from all the applications that you need – which reduces costs and increases performance. It also increases security and compliance, because now you’ve reduced the number of copies of your data across the enterprise.
A differentiator for Pure is its storage-as-a-service pricing model. Can you talk about the Evergreen Storage Service (ES2)?
Our competitors would view it as just pricing. But it’s a lot more than that. We promise our customers that if they’re on our evergreen model, which is a subscription model, that we will keep their storage system constantly updated to the latest hardware and software – meaning that they never have to migrate their data off the system. Our competitors can’t do that because they can’t do what is called a nondisruptive upgrade. They can’t replace the hardware and the software without downtime. When our competitor goes to a new product model and obsoletes the old model, they force the customer to migrate the data. They can’t upgrade the old model.
So this gives assurance that if a customer is buying now, they won’t need to change out an array in a few years?
Exactly. We do it all in place. If they’re paying the subscription, they don’t pay any more money. We upgrade the system for them as far as the subscription. Another benefit is that we don’t charge them again for the same storage. Let me give you an example. Let’s say they buy a system with 50 terabytes of storage in it. A few years later, if they want to upgrade that to 250 terabytes, they only pay for 200 terabytes. They don’t need to pay for the first 50 terabytes over again.
Is that a common scenario? Are customers typically making the shift to flash storage in increments? What’s a typical adoption path?
We do see that. We see, with our top 25 customers, between 10x and 12x over the next four years, from their first purchase. We see on the order of 4x to 5x over the first two or three years, on average for all of our customers.
Are we going to see disk drives disappear?
We’ll still see disk drives, but they’ll start to migrate to cold storage.
We believe that tier one and tier two will collapse. The reason goes back to what I mentioned before: We could have multiple applications accessing the same storage at the same time. So if you already have so-called tier one storage, but now you could allow the apps that typically go to tier two to access the data on the tier one storage, without affecting the tier one applications, you can collapse the tiers.
We believe that tier one and tier two will both go to flash as prices drop. For cold storage, we believe it will go the cloud, and even if it goes to the cloud, it will be magnetic. Otherwise known as cheap and deep.
What’s the big deal about NVMe?
We care because, believe it or not, we’re still dealing with old protocols that were designed for magnetic storage. Before NVMe, the protocols to access storage, even solid-state storage, were designed for magnetic storage. Whether that was SCSI or SATA or SAS or iSCSI – those were all designed as fundamentally serial interfaces that were designed for relatively slow storage.
NVM stands for non-volatile memory – meaning, basically, solid-state memory. NVMe is a more modern protocol that recognizes both the speed of networks that we can now have available to us, as well as the fundamentally parallel nature of solid state.
NVMe is a more parallel way to access the solid-state storage. That’s very meaningful, especially to Pure, frankly, even more than to our competitors, for the following reason: Only Pure uses raw, flash memory. The majority of our products use what we call DirectFlash. We speak directly to the flash, across our entire product. All of our competitors use so-called SSDs, or solid-state disks. Now, ‘solid state disk’ is a bit of an anachronistic title, because there’s no disk. They’re solid state. But an SSD makes flash memory appear to be a magnetic disk. That’s why they’re designed. Competitors can claim to be all flash, but all they really did was remove a magnetic disk and put in an SSD. But it suffers from all the limitations that the magnetic protocols have provided. They’re relatively slow, they do not optimize their use of flash, and they’re serial in nature. They don’t provide a parallel interface to the flash.
We put NVMe into place early last year, so we’re well over a year now using NVMe for access to our flash. It’s very meaningful for us. It means we can be even more efficient in flash. Even faster in terms of both write speeds and read speeds for our customers. And it allowed us to use any type of flash memory that was economical for us to use, consumer and/or enterprise grade.
What about NVMe over Fabrics (NVMe-oF)?
NVMe over Fabrics is a very high-speed way of getting access to your storage over traditional interfaces, like Ethernet or Fibre Channel. That will be important for us, because it allows our shared accelerated storage model.
[Consider the three primary ways to access storage: direct attached storage, SAN and NAS.] With NVMe, we now can make all three of those look the same. We call it shared accelerated storage. We can remove DAS, so instead of servers having to have their own disks on board, they can now have an NVMe interface to an array, and get the same if not better performance than they got before.
Shared accelerated storage can replace SAN with NMVe and get better performance. And with network-attached storage, it’s the same thing: NVMe will make it faster than using traditional protocols.
Lastly, NVMe over Fabrics does create that parallel interface, even though it’s over an Ethernet. It allows multiple-disk access for read and write to occur at the same time. That’s critical for things like AI, analytics and other large, complex, multi-threaded workloads.
Are enterprises coming to Pure to solve specific workload challenges, or are they looking for broader, more strategic storage overhauls?
There’s a spectrum of managers. There are some that are very much caught up in their existing environments and frameworks, and for them it’s really just about improving the way they do things today. There are others, though, that are struggling with the demands placed on them. Moving to new workloads, for example. They’re struggling with scaling application environments with DAS or with SAN, or migrating to things like Amazon S3 environments. When we came out with the data-centric architecture and the details behind that – including this idea of removing DAS altogether and migrating to a more centralized approach, a data-centric approach – they said that’s exactly what they have been looking for, and they didn’t realize that they could do it.
Are there particular workloads that are driving adoption?