How AMD's "Zen," now called Ryzen, aims to shake up the CPU world, which is just what we need.PCGamer
AMD's new Zen microprocessor has been a long time coming. We've had a few glimpses of what to expect over the past six months, but at the recent AMD Tech Summit, AMD finally revealed the official name for the new processor: Ryzen. It's an homage to the Zen codename, and probably a hell of a lot easier to trademark than Zen would have been. To be clear, Zen is still the name of the architecture, so I'll use that when referring to the entire family, while Ryzen specifically refers to the consumer models of the Zen family.
But that's just the chip name, and we still don't have an official word on the various models, or precise details of when Ryzen will launch, other than Q1'17. I've previously discussed what we know and expect of Zen, but with the new information and additional demonstrations (including AMD's New Horizon livestream event showcasing Ryzen and even a Vega demonstration), some things are clearer than before—and thankfully, performance is still basically on target.
I've updated and overhauled our Zen need-to-know fact sheet, which has now become the Ryzen article you see here. Here's the brief summary of the latest updates:
- New consumer name for Zen is Ryzen
- 8-core/16-thread parts will clock at 3.4GHz or higher
- TDP for the 8C/16T parts is 95W
- Ryzen equal or better than i7-6900K in multiple tests
- SenseMI and automatic overclocking with improved cooling
- Zen should be much more scalable (e.g., 32-core Naples)
- AMD has Vega GPUs up and running in a Ryzen system
A brief history of AMD's CPU architecturesSometimes you need to break the mold—shatter barriers and do something that hasn't been done before. But truly innovative products are really hard to make. If it were easy, everyone would be doing it, right? The more established a market becomes, the more difficult it is to innovate, but that doesn't mean companies can't go back to the drawing board and start fresh. For AMD, that's effectively what they're doing with the upcoming Zen family of processors.
To be clear, AMD is not starting from scratch. AMD has decades of experience building x86 processors, and everything they've done and learned in the past influences future designs. But when AMD launched the Bulldozer family of processors back in 2011, it was their first major architecture overhaul since the K8 came out in 2003. Six years later, AMD has overhauled the architecture again in a major way.
That's good news, because if you're using a midrange ($1000) or higher PC to play games, you're likely using an Intel Core i5/i7 processor of some form. The CPU scene has been pretty stagnant for several years now, with Intel's regular tick-tock cadence (now called process-architecture-optimization) giving incremental improvements each generation. AMD has tried to keep pace, but Bulldozer started with a performance deficit, and if you want the fastest and most efficient CPU, Intel wins, period.
AMD has had the unenviable task of being the last remaining x86 alternative to Intel, but their manufacturing process fell well behind, with GlobalFoundries being split off from AMD in 2009. Relegated to primarily competing on price, AMD has done quite well at retail markets where people just want an inexpensive PC (think Walmart). But once you get out of the budget sector and into the more lucrative midrange (and business) PC space, Intel dominates sales. And with good reason: take any Core i5 processor and it will generally outperform any AMD CPU of the same generation in the majority of applications and workloads.
This is bad for innovation, bad for the industry, and bad for both AMD and Intel users. We need competition, and it's not too surprising that some of Intel's biggest advancements in processor technology (the Core architecture) only came after AMD took the performance crown from them in the Athlon 64 era. The good news is that Zen should shake things up again, with AMD taking another shot at the performance CPU market. Even if AMD can't claim the outright performance crown, having a real performance alternative to Core i5 will be great—especially if it still comes at a lower price.
Designed to scale from mobile products up through desktops, workstations, and servers, Zen looks promising, but will it be enough? Modern processors are complex beasts, with many opportunities for things to go wrong. Here's what we know about Zen, what we expect it will do, and additional thoughts on what we actually want it to do.
What we know: Zen's ArchitectureWe've had variations of AMD's Bulldozer (Piledriver, Steamroller, and Excavator) for five years, all built using the same building block of a CMT (clustered multi-threading) module that consists of two integer cores with a shared floating-point unit. (It's technically two 128-bit FMAC units that can also work as a single 256-bit FP unit.) All of that changes with Zen.
From a high level, Zen looks a lot like Intel's Core architecture. Gone is the CMT module and in its place AMD is using a 4-core/8-thread SMT (symmetric multi-threading) building block. AMD will likely scale down to a 2-core/4-thread module as well, but all indications are that the CPU-only variants of Ryzen will launch with 8C/16T, with a 4C/8T version likely to follow (though no clear indication on how soon that might be).
Current rumors are that the 4C/8T parts will be sold under the SR5 brand, with 8C/16T selling as SR7, and a lower-tier SR3 to follow. Those may be code names or they may end up being retail names (similar to Intel's i3/i5/i7 nomenclature), but for now I'll stick with calling the SR7 part 8C/16T and the SR5 part 4C/8T.
Along with SMT support, the pipeline and various other elements of the architecture have also been reworked. The L1 cache is a faster write-back design, and L2 cache is also up to twice the bandwidth. L3 cache meanwhile will deliver up to five times the bandwidth. There's a new micro-op cache, and each core can issue up to six micro-ops (or four fp-ops) per cycle—similar to Skylake's 6-wide issue width and 50 percent higher than the 4-wide design of the Bulldozer 'heavy equipment' family of CPUs.
Zen has an improved 'perceptron' branch prediction algorithm, now decoupled from the fetch stage, which again helps performance. We don't actually know the pipeline length for Zen (Bulldozer is estimated at a 20-stage pipeline), but better branch prediction can help mitigate having more stages. Notice for example that Intel's NetBurst pipeline was nominally a 20-stage design, which was 'too long' back in the day, and yet all of Intel's designs going back at least to Sandy Bridge are around the same length. And not to downplay these aspects, but Zen also features larger load, store, and retire buffers, along with improved clock gating.
Then there's the platform. Ryzen will use a new AM4 socket, with one of several chipsets, A320, B350, and X370. Regardless of chipset, the platform will remain as a dual-channel DDR4 setup, and the CPU socket has 1331 pins. Sticking with dual-channel makes sense as well, as it keeps motherboard costs in check, and it allows for up to 64GB max memory. As for the socket, 1331 is a good number of pins because it's more than Intel's LGA1151, and gives sufficient pins for the rumored 36 PCIe Gen3 lanes on the CPU—that would be 32 lanes for graphics cards, with another four lanes likely used to connect with the chipset. However, some previously leaked information indicates X370 will be required for SLI/CF setups, so we could end up with more PCIe lanes linked to the chipset, which would in turn connect to the PCIe slots.
Along with DDR4 support, perhaps equally important is the inclusion of USB 3.1 Gen2 (10Gbps) and NVMe M.2 support. (SATA Express on the other hand appears to be dying fast, so its inclusion doesn't really matter to me.) Obviously M.2 NVMe drives remain something of a high-end option for most PC builds, and for gaming in particular there's little benefit compared to a good SATA drive. But then, Ryzen clearly isn't targeting budget builds as the only option, so being able to use a modern M.2 NVMe drive is a must.
I can't emphasize enough how big of a fundamental change all of this represents, and it means everything we know about AMD's CPU performance from the past may no longer apply. AMD has stated a performance target of 40 percent better IPC (Instructions Per Clock) with Zen versus Excavator, and these architecture changes should provide some excellent per-clock performance improvements. A 3.0GHz Zen core should be 40 percent faster than a 3.0GHz Excavator core (though we never saw these outside of APUs), based on AMD's claims. But there's a catch: we don't know the final clock speeds for Zen/Ryzen. I'll get back to that in a moment, but first let's talk about some other aspects of the architecture as well as the process technology.
SenseMI TechnologyOne of the new technologies and names to come out of the AMD Tech Summit is SenseMI (pronounced "Sense Em-Eye" and not "Sense Me"—though I sort of like the sound of the latter). A lot of this appears to be rebranding and grouping together of features that are often found on existing processor designs, but there are a few new twists.
The above slides provide the short overview, but several of them don't really tell us much. The Neural Net Prediction and Smart Prefetch in particular don't seem to be anything new—we've had 'smart' branch prediction that 'learns' from previous code execution going way back to at least the Pentium P5 era. Everything since then has been about improving the way branch prediction tracks states and predicts the next iteration. It's impossible for me to say at this point whether Zen's branch prediction is better, worse, or similar to Intel's current design, but it's almost certain to be better than the Piledriver/Excavator/Jaguar/Puma. The same goes for smart prefetch—Intel has used that term going back at least to the first Core processors. I'm going to assume AMD's branch prediction and prefetching are up to snuff here and move over to the other items.
Pure Power seems a lot like the evolution of AMD's PowerNow (CPUs) and PowerTune (GPUs) technologies, and AMD discussed similar ideas with their Polaris 10 architecture last year. The main idea is that Pure Power will optimize behavior to enable running at lower power draw with the same level of performance, and it ties in with the second technology in SenseMI, Precision Boost.
Precision Boost is a bit more interesting. We've had Turbo Core (AMD) and Turbo Boost (Intel) for a while now, allowing CPUs to dynamically change clocks and potentially exceed the minimum guaranteed clock speed. This has allowed for quad-core and even 6-core/8-core/10-core CPUs to offer relatively similar single-threaded performance to dual-core designs, while also being able to run heavily multi-threaded workloads without exceeding the power budget. Precision Boost tweaks things from the old way, providing a new granularity of 25MHz clock speed adjustments.
I'm not sure how important this will actually be, as the difference in performance between 3.5GHz and 3.525GHz for example is going to be so small that not even benchmarks will likely be able to show a consistent improvement. Still, it gives AMD the potential to extract every last bit of performance from a CPU. And there's the final part of SenseMI where this could become useful.
Extended Frequency Range (XFR) is designed to reward enthusiasts with high-end cooling. If you've been using a stock CPU fan on your system, you'll know that it usually gets the job done just fine. It may not be the quietest option, and CPU temperatures might get a bit warm at times, but in general it's sufficient to meet the minimum performance levels of AMD or Intel CPUs. If you upgrade to a better air cooler, or a closed-loop AIO solution, and you don't engage in any form of overclocking, you basically get very little benefit—maybe slightly lower noise levels and/or temperatures.
XFR shakes things up by providing some level of autonomous overclocking. How much isn't something AMD specifically wanted to comment on, but they do note clock speed scaling with better air, liquid, and even LN2 (liquid nitrogen) cooling. Now, I'm not one to dabble with LN2, but if XFR means anyone willing to install a better cooler can suddenly improve clocks speed by 300MHz (5-10 percent), that would be pretty awesome. It would also make comparisons between benchmarks more difficult ("What CPU cooler was PC Gamer using when they showed Ryzen getting XYZ in Cinebench?"), but I'll deal with that as needed.
Scalability and the Infinity FabricSomething you'll see mentioned in the SenseMI slides is the Infinity Fabric. This is an area that AMD discussed in greater detail at the Tech Summit, as it's quite important. This large umbrella term actually relates to both internal and external communications on Zen products—and it replaces a whole host of overlapping technologies from previous architectures. It would be great if I could just say the Infinity Fabric represents the internal topology of Zen, but that's not quite true. It encompasses that, but it also includes various signaling and control protocols (see SenseMI above). I have to apologize here for the lack of clear details, as AMD didn't provide us with slides specifically on the Infinity Fabric, but let me talk about the high-level overview.
In previous architectures, AMD had a host of protocols and communication channels, including Hyper Transport, with other upcoming standards like CCIX (Cache Coherent Interconnect for Accelerators). Infinity Fabric is designed to include all of these, with a common set of protocols that will, long-term, enable significantly better scalability and efficiency. Hyper Transport can still be used, or a mesh fabric, or CCIX, or a direct connection, or whatever else—the key is that all of these are supported by Infinity Fabric and various chips can use whatever works best. But instead of trying to track dozens of protocols (think of each as a slightly different language or dialect), everything can talk over the Infinity Fabric.
What this means in practice is that Zen-based designs, among others, should have far better scalability. With Bulldozer, if AMD wanted to break away from the basic 2-core module, it required a lot of customized logic on the die to handle things. The same goes for the 'cat cores' like Jaguar and Puma, where doing 2/4 core designs wasn't too bad, but something like the 8-core Jaguar CPUs at the heart of the PS4/XB1 required more work. From what I understand, automated layout tools for the processor die will gain a lot of flexibility, including potentially power savings and performance, by using the Infinity Fabric.
It's perhaps easier to give some concrete examples of the 'old way' and the 'new way' of things. Piledriver was the last major CPU architecture upgrade from AMD, with Steamroller and Excavator confined to use in APUs. For consumer CPUs, Piledriver found its way into the Vishera core, with the high-end design being the FX-8350 and similar chips. It has 1.2B (billion) transistors, 8MB L2 + 8MB L3 cache, and measures 315mm^2. For server parts, the 16-core Opteron parts basically took two Vishera chips and put them in a multi-chip package, with some additional custom logic to facilitate communication between the two die. What if AMD wanted to make a 24-core Piledriver part? They would have had to rework things even more, which would be time consuming and might not even scale that well.
Now, compare that to how Zen is designed. Ryzen will be an 8C/16T consumer part at launch, and we expect a cut-down 4C/8T part to follow, along with APU variants at a later date. AMD hasn't demonstrated any of the 4C stuff yet, but scalability can also go the other direction. For Zen, AMD has already publicly demonstrated a Naples server chip with 32-cores/64-threads, and they had several servers at the Tech Summit with dual sockets (and multiple GPUs). Given appropriate demand, AMD should be able to create 12C/24T, 16C/32T, 20C/40T, 24C/48T, and 28C/56T Zen chips as well. Some of these would use a larger die with portions disabled, but rather than having just two options, they should be able to create several die variants.
Something to note here is that there's no indication the server parts will use the Ryzen name—AMD may stick with Opteron, or they may have a separate name for the new server family of CPUs. As you'd expect, scaling from an 8C/16T package to 32C/64T results in an absolutely massive chip. AMD hasn't provided transistor counts or cache sizes for Naples, but I expect it to be every bit as large as the biggest Intel chips.
By way of comparison, the 24-core Broadwell-EP clocks in at around 7.2 billion transistors, with 60MB of L3 cache and a 456mm^2 die size, which makes the i7-6950X seem rather puny at just 25MB L3 cache, ~3.4 billion transistors, and 246mm^2. In short, compared to their previous design, AMD's Zen should prove far more scalable, thanks in a large part to the Infinity Fabric.
Process TechnologyIn the Bulldozer and earlier timeframe, AMD was similar to Intel in that they did all of the processor design, chip fabrication, and manufacturing in-house for their CPUs. Running a chip foundry—and keeping it up to date—is an expensive proposition, however, and if the facility isn't fully utilized it can be a huge money sink. AMD sold off their fabrication facilities in 2009 and GlobalFoundries (GF) was born, but the separation of the two companies took quite a few years before it was fully realized.
Today, GlobalFoundries is taking orders from other major companies beyond AMD, and AMD has fully divested the last of their GF stock (back in 2012), though they still have wafer agreements in place. AMD is no longer fully reliant on GF and can pursue manufacturing agreements with other facilities, focusing on chip design rather than the foundry business. GF is likewise free to take orders for all of their available manufacturing capacity, and GF has also licensed Samsung's 14n FinFET production process has plans to move to 7nm FinFET as their next major process node.
AMD is using GF's 14nm FinFET node for Zen, and moving to a competitive process is a huge jump from Vishera's 32nm SOI process. Recent AMD APUs have been using 28nm SOI, so again this will be a big change in feature size. It's similar to the jump graphics chips saw this past year, which ultimately helped double the efficiency of GPUs. Combined with all of the architectural enhancements, the move to a significantly smaller process node should prove hugely beneficial to AMD and Zen.
What we expect: PerformanceThe combination of a new higher performance architecture with an up-to-date manufacturing process basically means anything can happen when it comes to the final product. Is the GF 14nm FinFET fully ready, or are yields still a bit iffy? We don't really know. Does the Ryzen architecture live up to the hype? Again, we don't know for sure, though early indications are promising. And what will the final clock speeds be on retail parts? Previously, there were reports of engineering samples clocked at up to 3.2GHz floating around, but early ES chips don't tell us much about the final clocks.
Thankfully, we have more details coming straight from AMD now. While they wouldn't commit to any maximum turbo clocks, AND has now gone on record as saying the 8C/16T Ryzen parts will ship at 3.4GHz or higher for the base clock. That might seem low, but keep in mind that Intel's highest-clocked 8C/16T part is the i7-6900K, which comes factory clocked at 3.2GHz with a maximum Turbo Boost of 3.7GHz. So at least for many core parts, Ryzen is looking very competitive.
As noted above, AMD has claimed IPC is 40 percent higher than Excavator, which is a bit of a tough claim to prove as Excavator wasn't used outside of APUs. We know Piledriver was slightly better on IPC and efficiency than Bulldozer, and Steamroller (APU-only) and Excavator improved on that even more. The last pure CPU design out of AMD was Vishera, the FX-8300/9000 series, with maximum clocks of 4.3GHz on the FX-8370.
Even if Zen's maximum clock speed drops from 4.3GHz to 3.8GHz, Zen/Ryzen should still end up being a significantly faster processor. 40 percent better IPC than Excavator cores means somewhere around 50 percent better IPC compared to Vishera. With a baseline of 3.4GHz at a minimum, and turbo speeds likely being at least 300MHz higher (possibly 600MHz or more higher), Ryzen should put some fear into the heart of Intel.
And AMD has offered up not one but two public benchmarks showing Ryzen beating i7-6900K now. Earlier this year, they showed both the i7-6900K and Zen running at 3.0GHz in a Blender test, with AMD coming out just ahead. Now, they showed a Zen part locked at 3.4GHz (no turbo) running the same benchmark against the fully enabled (Turbo Boost Max 3.0) i7-6900K at the Tech Summit. This time, Zen basically matches Broadwell-E, which is great to see. AMD followed this up with a second benchmark of Handbrake H.264 video transcoding, another test that leverages multi-threading to a high degree, and Zen at 3.4GHz emerged victorious—54 seconds compared to 59 seconds.
Perhaps even more importantly, AMD showed real-time power monitoring of both Zen and i7-6900K during the Handbrake workload. Both systems showed power draw (for the CPU only, if I'm not mistaken) in the 90-100W range, but with Zen consistently being about 5W lower power. That's only two tests, and notably absent are any single-threaded tests with turbo modes enabled, but we're still a couple of months away from public availability.
To put things into perspective, I've run numbers on current AMD and Intel APUs/CPUs. A single Steamroller core (A10-7890K) runs at up to 4.3GHz and nets 97 points in the single-threaded Cinebench 15 test, while a Piledriver core (FX-8370) running at up to 4.3GHz nets 99 in the same test (note that the difference in platform has likely negated any IPC gains of Steamroller over Piledriver). The Broadwell-E i7-6900K (3.7GHz) gets 153, so using that earlier estimated 50 percent IPC improvement over Piledriver, AMD's Ryzen should be quite competitive with the i7-6900K, probably in the 140-150 range.
However, Haswell/Devil's Canyon Core i7-4790K (4.4GHz) gets 173 and a Skylake i7-6700K (4.2GHz) gets 182 in the same test, thanks to higher clocks and improved IPC in the case of Skylake. Both Broadwell-E and Zen come up short in single-threaded performance compared to the i7-6700K, and Ryzen would need to hit turbo clocks well in excess of 4.0GHz to close the gap. But remember AMD is only showing the 8C/16T part; a 4C/8T part might show much better clock scaling—we can hope at least.
What about games, which don't usually scale super well with CPU core counts? Games are a wild card for Zen, because there are times where AMD's current CPUs are a noticeable bottleneck, at least with a fast GPU like a GTX 1080 running at 1080p. It's also possible that Ryzen may see larger than 40 percent gains in performance over Vishera in games, but without testing it's impossible to say. Regardless, I expect anyone gaming on a 1440p display should find that a quad-core Ryzen will close most of the gap between the FX-8370 and the i5-6600K.
And there's still the multi-core factor. DX12 in theory allows games to scale performance with more CPU cores, but so far we've seen little indication of scaling beyond quad-core. Ashes of the Singularity is really the only game that pushes beyond a 4C/8T chip, making use of 6-core and even 8-core processors. Other DX12 games haven't hit the CPU nearly as hard, and unless they're using tons of units like AotS, I don't expect this to change that much. A straight quad-core chip will likely prove a bit of a handicap going forward, but quad-core plus SMT isn't close to hitting the end of the road.
Ryzen: What we want
(As a side note, when I first tested Intel's i7-6700K and Z170, my motherboard BIOS wasn't properly tuned. An updated BIOS a couple of weeks after the launch gave me a 10-15 percent boost in performance, so it's possible AMD could still surprise us.)
Assuming everything else goes as planned, clock speeds will determine a lot of how Ryzen ranks against Intel. AMD's 8-core Ryzen chip looks like it can take on a Broadwell-E i7-6900K, with both running at similar clocks (~3.4GHz), in both Blender and Handbrake. We can't just take those results and unilaterally apply them to all benchmarks, but it's a start—and AMD says they'll provide the video and Blender files so others can run similar tests on other hardware, which is great to hear.
However, the i7-6900K can also hit 4.5GHz with good cooling, and we don't know how much higher the Ryzen chips will actually clock. Looking at the clock speed of RX 480, at least so far the Samsung/GF 14nm FinFET process doesn't appear to clock quite as high as other processes (TSMC 16nm FinFET and Intel 14nm FinFET), but that could be due to the various chip designs rather than the manufacturing process.
Somewhat ironically, given I use an i7-5930K almost daily, I don't see 8C/16T Ryzen processors as being 'mainstream' products (just like Intel's X99 platform isn't 'mainstream'), in part because the drop in efficiency and clock speed isn't usually worth it for the additional cores. But if you're running programs that can use the cores, it's awesome, and efficiency becomes less important on high-end workstation class processors. But Ryzen impresses here, as even in an 8C/16T configuration AMD is targeting a 95W TDP.
Further out, while taking on the i7-6900K is great news for Ryzen, even better would be a quad-core APU with performance rivaling the i5-6600K, paired with a GPU equal to the RX 460. I'd still want discrete graphics, but having a compelling alternative to Intel would be great. Okay, to be honest I really want to see an APU with performance closer to the RX 480, but I doubt that's happening anytime soon, since the 480 has up to 8GB of GDDR5 providing 256GB/s of bandwidth. Even the RX 460 has 112GB/s bandwidth, which a Ryzen APU can't hope to match with a shared 38.4GB/s memory setup (DDR4-2400).
On that note, let's hope the official DDR4-2400 support is merely the baseline and that, like Intel's X99 and Z170 platforms, we see overclocked memory support for stuff like DDR4-3200 (51.2GB/s). Higher performance memory really doesn't matter much for a typical desktop system, but for an APU it can help graphics performance a lot. (Too bad Ryzen and AM4 aren't quad-channel memory, as that could actually be a real benefit for APUs as well—not that it would really be worth the cost.)
Finally, there's the almighty question of the price. This is where AMD has shown their ability to beat Intel for years, because Intel in general refuses to sell higher performance processors for anything less than $175—that's been the approximate entry price for the least expensive quad-core i5 chips going back to the first generation Core i5 chips. With Ryzen, AMD will have a few options, depending on where final performance falls.
If AMD can clearly match or beat Intel's Core i5 Skylake parts with their 4C/8T Ryzen chips, we will likely see prices in the $150-$200 range—and if they're competitive with Core i7 Skylake parts AMD could even go for $200-$250. 8C/16T meanwhile probably isn't going to flirt with Intel's obscene $1000-$1750 prices at the top, and depending on clocks and performance the current guess is $300-$500. Assuming AMD can come close to (or beat) Intel's Broadwell-E performance, we could end up with better prices for everyone, and that would be great to see.
There are a lot of 'ifs' and other qualifiers right now. What will Ryzen actually deliver? Everything is looking promising, and AMD appears confident in their New Horizon livestream. Ryzen is currently slated for a Q1'17 launch, which means probably March rather than January—because at this point, if it were a January launch AMD would be saying as much. But even if AMD's Ryzen can't beat Intel in every benchmark, they're bringing proper competition back into the CPU market, and that's great news.
Also, Vega is coming: