AMD’s road to the data center and HPC isn’t as long as you think | Virtual Reality

On Nov 20, 2018

Last week, AMD announced it was ready to take on Nvidia in the GPU space for the data center, a market the company had basically ignored for the last several years in its struggle just to survive. But now, buoyed by its new CPU business, AMD is ready to take the fight to Nvidia.

It would seem a herculean task. Or perhaps Quixotic. Nvidia has spent the past decade tilling the soil for artificial intelligence (AI) and high-performance computing (HPC0, but it turns out AMD has a few things in its favor.

For starters, it has a CPU and GPU business, and it can tie them together in a way Nvidia and Intel cannot. Yes, Intel has a GPU product line, but they are integrated with their consumer CPUs and not on the Xeons. And Nvidia has no x86 line.

AMD’s next-generation Epyc server processors

AMD is preparing the next generation of its Epyc server processors under the codename “Rome,” and they look like monsters:

7nm design while Intel is still stuck at 14nm. That means twice as many transistors in the same space as the existing chip and Intel’s Xeon, which means better performance.
64 cores and 128 threads per socket.
An I/O chip in the middle of each chip to handle DDR4, Infinity Fabric, PCIe, and other I/O.
PCIe Gen4 support, providing twice the bandwidth of PCIe 3.
Greatly improved Infinity Fabric speeds, enabling inter-chip and memory communication.
Most important, the ability to connect GPUs to CPUs and do inter-GPU communication with the CPU.

The design of Epyc 2 is actually eight “chiplets” with eight cores each, connected by the fabric, with the I/O chip sitting in between the chiplets. Communication between the CPU and GPU, however, is done with PCI Express 4, which is not as fast but still mighty quick, and it gives AMD the advantage.

One thing I have learned is that AMD is not at so great a disadvantage after all. It turns out that just because Nvidia has the CUDA language and a huge support base, CUDA or any other proprietary language is not needed to bring GPUs to bear.

“If you are just walking into the market for the first time looking to develop some AI algorithm, then you’re either going to try and grab some software or write your own. If you write your own, then you use whatever language you are most comfortable in,” said Jon Peddie, president of Jon Peddie Research, which follows the graphics market.

Google’s Tensor AI is written in C/C++ and Python, he noted. AI apps for training that use CUDA because the developers knew it, not because it was necessary.

Nvidia’s advantages

The one advantage Nvidia has is container technology that takes code written in one language and translates it into a language Nvidia understands.

“As far as I know, AMD doesn’t have a container,” said Peddie.

Nvidia has other technological advantages, as well. It put Tensor cores in the new Turing generation of GPUs to offer basic matrix math engines, like Google’s Tensor processor does. That makes the Turing generation well suited for matrix math, the foundational math in AI training.

Peddie also noted that Nvidia has mindshare. Its status rivals that of Intel, and it could be argued Nvidia has eclipsed Intel. Nvidia shareholders would certainly agree.

AMD’s “biggest challenge is the challenge they’ve always faced: Can they market? Nvidia is one of the most powerful brands you’ve ever heard of, up there with Sony and Apple,” Peddie said.

AMD has competitive GPUs, but as Peddie put it, “they got the ammo. They need to figure out how to pull the trigger.”

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.