How AMD’s Zen 2 Architecture Boosts Performance-Per-Watt
With the launch date of AMD’s upcoming Zen 2 architecture fast approaching, the company has pulled back the curtain and given us a view into the capabilities and improvements of its new uarch. These new chips include a number of improvements and benefits to drive both higher instructions per cycle (IPC) and better overall power consumption.
Let’s begin with some basics. The Ryzen 3000 family is powered by AMD’s Zen 2 architecture except for its APUs. APUs are effectively running a generation behind 2000-series APUs were actually built on first-generation Ryzen, and 3000-series APUs are based on second-gen Ryzen. The architectural enhancements and other features we will discuss today do not apply to the Ryzen 3 3200G or Ryzen 3 3400G.
One point AMD Corporate Fellow Mike Clark made during his presentation on Zen 2 is that its 7nm transition was actually more successful than it initially predicted.
Some of you may recall rumors that AMD would field Ryzen 3000 CPUs with far higher CPU clocks than previous parts. According to AMD engineers, the company did not necessarily expect Zen 2 to hit higher frequencies at all. This is the intrinsic problem of modern CPU node shrinks. Smaller process needs mean lower voltages, and lower voltages can have negative impacts on absolute operating frequency. In this case, however, TSMC’s 7nm node and AMD’s own engineering were able to create parts that could hit modestly higher frequencies than 12/14nm chips.
The fact that AMD did not expect to necessarily see clock frequency improvements on 7nm is something to keep in mind when evaluating the accuracy of rumors about massive clock jumps in the future.
One important change coming with Zen 2 has nothing to do with the actual CPU itself. AMD notified us at the event that there are new scheduler changes incorporated into the Windows 10 Scheduler as of Windows 10 1903 (May 2019 Update). There are two new capabilities: Topology Awareness and faster clock ramping. Faster clock ramping reduces the amount of time for the CPU to switch clock states, improving performance and theoretically improving idle power consumption by allowing the CPU to shift to lower clock states more quickly. Topology awareness should help keep data local to appropriate CCXs and fill one CCX before loading another.
These gains — +15 percent 1080p performance in Rocket League and a 6 percent improvement in PCMark 10 application launch — are solely the result of the Windows 10 scheduler update, and are separate from any additional gains as a result of improvements to the Zen 2 architecture. Taking advantage of these improvements requires both an updated chipset driver and the Windows 10 1903 update.
This slide represents AMD’s big-picture microarchitectural overview. The chip integrates a new TAGE branch predictor in addition to the perceptron BP it has used in the past. The micro-op cache has been increased to 4K instructions, with double the total L3 onboard. (AMD is now referring to its combined L2 and L3 as “AMD GameCache.) There is now a new address generation unit (AGU) attached to the integer side of the core, with full support for 256-bit floating point via AVX2.
The slideshow below contains our deep dive into the specific architectural enhancements of the third-generation Ryzen CPU. Each slide can be clicked to open it in a new window.
According to AMD, these improvements leave them far ahead of Intel, both in terms of performance-per-watt and absolute power consumption at the wall.
Cinebench is not the be-all-end-all of power consumption measurement, but it’s not a bad test, either. The 3700X — which, in fairness, probably sits the closest to the overall sweet spot for the architecture — is supposedly 56 percent more efficient than the Core i7-9700K, while drawing just 86 percent as much power in absolute terms.
The gains against the 2700X in terms of overall power efficiency are even larger. AMD claims the 3700X is 1.75x more efficient in perf/watt than the 2700X, while drawing 70 percent the power.
While we obviously can’t judge a launch until we have hardware to test, AMD is laying out an aggressive, exciting product family. TDPs have fallen dramatically. IPC has reportedly increased by 1.15x. Clock speeds have been bumped. The scheduler improvements and doubled-up floating point capability should provide their own robust improvements over and above this 1.15x figure. The Infinity Fabric bus width has been doubled to allow PCIe Gen 4 bandwidth to be fully utilized, and there’s a new memory divider at DDR4-3733 to allow for lower IF clocks without compromising DRAM scaling.
If you’re a fan of AMD’s APUs, 7nm has exciting long-term implications for them as well. While we don’t know when we’ll see these parts, the company has clearly aggressively targeted lower power consumption across the board. This will clearly pay off when it comes time to refresh the APU family on 7nm. One of our theories about the 7nm launch was that AMD would emphasize power-efficiency on at least some parts, and we’re absolutely seeing that, with a higher-performing 8-core CPU in a 65W TDP and a 16-core CPU in a 105W TDP.