Nvidia’s New DGX A100 Packs Record 5 Petaflops of AI Performance

On Jun 23, 2020

Nvidia has unveiled the new next-generation Ampere GPU architecture that aimed at winning the title of fastest processors for running artificial intelligence programs. The first GPU to use Ampere will be Nvidia’s new A100, built for scientific computing, cloud graphics, and data analytics.

The Ampere chip is the successor not only to the Volta GV100 GPU that was used in the Tesla V100 accelerator announced in May 2017 but, the chip is also the successor to the Turing TU102 GPU used in the Tesla T4 accelerator launched in September 2018 that aimed at graphics and machine learning inference workloads.

The Nvidia has successfully created a single GPU that can not only run HPC simulation & modeling workloads considerably faster than Volta but also converges the new-fangled machine learning inference based on Tensor Cores onto one device. Nvidia’s CEO Huang called it the ultimate instrument for advancing AI.

Similar to how Nvidia used its previous Volta architecture to create the Tesla V100 and DGX systems, a new DGX A100 AI system combines eight of these A100 GPUs into a single giant GPU.

Combining these eight GPUs means there’s 320GB of GPU memory with 12.4TB/s of memory bandwidth. Nvidia is also including 15TB of Gen4 NVMe internal storage to power AI training tasks.

The A100 is insanely packed with 54 billion transistors, making it the world’s largest 7nm processor and it has the power of delivering 5 petaflops of AI performance and consolidating the power i.e 20 times more than the previous-generation chip Volta along with capabilities of an entire data center into a single flexible platform.

Nvidia CEO Jensen Huang said it can make supercomputing tasks that are vital in the fight against COVID-19 much more cost-efficient and powerful than today’s more expensive systems.

He further continues and said that cloud usage of services is going to see a surge. Those dynamics are really quite good for our data center business. My expectation is that Ampere is going to do remarkably well. It’s our best data center GPU ever made and it capitalizes on nearly a decade of our data center experience.

Nvidia DGX is the first AI system built for the end-to-end machine learning workflow i.e from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.

Nvidia has also announced that it has been working with the Spark community to accelerate that in-memory, data analytics platform with GPUs for the past several years, and it is now also ready.

And thus, now the massive amount of preprocessing as well as the machine learning training and the machine learning inference can now be done all on the same accelerated platforms.

Nvidia’s recent $6.9 billion acquisition of Mellanox, a server networking supplier, is also coming into play, as the DGX A100 includes nine 200Gb/s network interfaces for a total of 3.6Tb/s per second of bidirectional bandwidth. As modern data centers adapt to increasingly diverse workloads.

“If you take a look at the way modern data centers are architected, the workloads they have to do are more diverse than ever,” explains Huang. “Our approach going forward is not to just focus on the server itself but to think about the entire data center as a computing unit.”

“Going forward I believe the world is going to think about data centers as a computing unit and we’re going to be thinking about data center-scale computing. No longer just personal computers or servers, but we’re going to be operating on the data center scale.”

Nvidia also launched the Nvidia DGXpert program, which brings DGX customers together with the company’s AI experts, and the Nvidia DGX-ready software program, which helps customers take advantage of certified, enterprise-grade software for AI workflows.

The systems have six Nvidia NVSwitch interconnect fabrics with third-generation Nvidia NVLink technology for 4.8 terabytes per second of bi-directional bandwidth and nine Nvidia Mellanox ConnectX-6 HDR 200Gb per second network interfaces, offering a total of 3.6 terabits per second of bi-directional bandwidth.

DGX A100 has begun shipping worldwide, with the first order going to the U.S. Department of Energy’s Argonne National Laboratory, which will use the cluster’s AI & computing power for COVID-19 research.

“We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the Nvidia DGX A100,” says Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne.

“The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”

Rick Stevens further continue and said that the center’s supercomputers are being used to fight the coronavirus, with AI models and simulations running on the machines in hopes of finding treatments and a vaccine. The DGX A100 systems’ power will enable scientists to do a year’s worth of work in months or days.

The University of Florida will be the first U.S. institution of higher learning to receive DGX A100 systems, which it will deploy to infuse AI across its entire curriculum to foster an AI-enabled workforce.

Among other early adopters are the Center for Biomedical AI at the University Medical Center Hamburg-Eppendorf which will leverage DGX A100 to advance clinical decision support and process optimization.

Nvidia says that Microsoft, Amazon, Google, Dell, Alibaba, and many other big cloud service providers are also planning to incorporate the single A100 GPUs into their own offerings.

“The adoption and the enthusiasm for Ampere from all of the hyperscalers and computer makers around the world are really unprecedented,” says Huang. “This is the fastest launch of a new data center architecture we’ve ever had, and it’s understandable.”

Much like the larger DGX A100 cluster system, Nvidia also allows each individual A100 GPU to be partitioned into up to seven independent instances for smaller computing tasks. These systems won’t come cheap, though. Nvidia’s DGX A100 comes with big performance promises, but systems start at $199,000 for a combination of eight of these A100 chips.