Sei sulla pagina 1di 14

Introduction to the Nehalem Architecture We Begin Once More This is an article we have all been anticipating for years

now as it introduces the most dramatic shift in Intel processing technology since the introduction of the front-side bus. And ironically, it is this shift that will finally remove the FSB from Intel products for good. The Nehalem core architecture has been the focus of most of Intel's Developer Forums for the last 24 months and the culmination of the technology, marketing and products begins today. Intel's Core i7 processors will bring a dramatic set of changes to the enthusiast and PC community in general including a new processor, new CPU socket, new memory architecture, new chipset, new motherboards and new overclocking methods. All of that and more will be addressed in our review today so be prepared for a LOT of valuable information. The Nehalem Architecture - Years of data summed up We have done more than our share of technical documentation of the architecture and design, enough so that I feel that duplicating all of it here would be somewhat of a disservice to our frequent readers. I will highlight the most important architectural shifts in the Nehalem design here but I still encourage you to read over my much more in-depth look at the processor design published in August: Inside the Nehalem: Intel's New Core i7 Microarchitecture.

Here you can see a die shot of the new Nehalem processor - in this iteration a four core design with two separate QPI links and large L3 cache in relation to the rest of the chip. The primary

goal of Nehalem was to take the big performance advantages that the Core 2 CPUs have and modularize them. Now with the Nehalem design, which will be branded as the Intel Core i7, Intel can easily create a range of processors from 1 core to 8 cores depending on the application and market demands. Eight core CPUs will be found in servers while you'll find dual core machines in the mobile market several months after the initial desktop introduction. QPI (Quick Path Interlink) channels can also vary in order improve CPU-to-CPU communication.

At a high level the Nehalem core adds some key features to the processor designs we currently have with Penryn. SSE instructions get the bump to a 4.2 revision, better branch prediction and pre-fetch algorithms and simultaneous multi-threading (SMT) makes a return after a brief hiatus with the NetBurst architecture.

HyperThreading Returns

I mentioned before that Intel is using Nehalem to mark the return of HyperThreading to its bag of weapons in the CPU battle; the process is nearly identical to that of the older NetBurst processors and allows two threads to run on a single CPU core. But SMT (simultaneous multithreading) or HyperThreading is also a key to keeping the 4-wide execution engine fed with work and tasks to complete. With the larger caches and much higher memory bandwidth that the chip provides this is a very important addition. Intel claims that HyperThreading is an extremely power efficient way to increase performance it takes up very little die area on Nehalem yet has the potential for great performance gains in certain applications. This is obviously much more efficient than adding another core to the die but just as obviously has some drawbacks to that method.

Here you can see Intel's estimations of how much HyperThreading can help performance in specific applications. Surprisingly one of the best performers is the 3DMark Vantage CPU test that simulates AI and physics on the processor while POV-Ray 3.7 still sees huge 30% boost in performance for this relatively small cost addition in logic. Welcome to the Uncore, we got fun and games...

A new term Intel is bringing to world with this modular design is the "uncore" - basically all of the section of the processor that are separate from the cores and their self-contained cache. Features like the integrated memory controller, QPI links and shared L3 cache fall into the "uncore" category. All of these components that you see are completely modular; Intel can add cores, QPI links, integrated graphics (coming later in 2009) and even another IMC if they desired. New cache structure, new L3 cache

The Intel Smart Cache makes a return with the Nehalem core but this time in a 3-level cache hierarchy design. The two first level caches include a 32 KB instruction cache and 32 KB of data cache and the L2 cache is a completely new design compared to the Core 2 CPUs out today. Each core receives 256 KB of unified cache that is 8-way associative that is both low latency (about 10 cycles from load-to-use) and scales well to keep extra load off the L3 cache. The L3 cache layer is completely new to Intel though AMD's Barcelona chip introduced a similar design late in 2007. This L3 is an inclusive cache that scales with the number of cores on the processor - quad core processors will have as much as 8MB in 16-way associativity. Any perceived latency on the L3 will depend on the frequency ratio between the core and uncore sections of the CPU - something we haven't gotten enough information on yet. Bring out yer' dead! (front-side bus) One of the features that Intel HAS been talking about for a while is the move away from the front-side bus architecture and to something called Intel's Quick Path Interconnect. Previously known only as CSI, common system interface, QuickPath is Intel's answer to AMD's HyperTransport technology and it performs a very similar function.

Starting with Nehalem and moving forward Intel's processors will feature a direct connect architecture that is point to point and will transmit data from socket to socket as well as from the CPU to the chipset all while scaling nicely as the number of CPUs and QPI links goes up. Part of the reason the QPI technology was needed on Nehalem was due to the new integrated memory controller on the processor. As AMD introduced many years ago, an IMC allows for higher peak memory bandwidth and lower memory latency though Intel is taking it another step up by offering a three-channel DDR3 memory controller from each CPU. The QPI is also a requirement of efficient chip-to-chip communications where one CPU might need to access data that is stored in memory on the other processors memory controller. The QPI design supports 6.4 GigaTransfers a second or 12.8 GB/s of bandwidth in each direction for 25.6 GB/s total bandwidth between two points. Future versions of QPI will scale up to faster speeds as well. You can also tell in the above four-CPU diagram that QPI will scale well with as many as four CPUs - each processor in this case would require four total QPI connections and would be only one hop from any other CPUs memory. An Integrated Memory Controller, with three channels! The Intel Nehalem Integrated Memory Controller (IMC) is actually pretty scalable in its own right - besides offering extreme high bandwidth and low latency the number of memory channels can be varied, both buffered and non-buffered memories are supported and memory speeds can be adjusted all based on the market that the processor will be targeted for. Low cost cores with only dual channel memory should cost considerably less than top end three-channel systems. At launch, the DDR3 memory controller located on Nehalem will only OFFICIALLY support DDR3-1066 memory speeds. While that is pretty lame, I was told on numerous occasions that the memory controller will run at speeds of DDR3-1600-2000 but official supports stops with JEDEC. The IMC in Nehalem will also force Intel to use the NUMA (non-uniform memory access standard) since memory will be stored in different areas (not just attached to the north bridge) for the first time in Intel's desktop processors. New Core Power Controls

The Nehalem core also has a new trick in its bag that enables it to lower the power consumption of a core to nearly 0 watts - something that wasn't possible on previous designs. You can see in the image above what the total power consumption of a core was typically made up of with the Core 2 series of processors - clocks and logic are the majority of it yes, but a third or more is related to leakage of the transistors and was something that couldn't be turned off in prior designs.

How is this changed with Nehalem? Well with the independent power controller in the PCU and

the different power planes that each core rests on, the power consumption for each core is completely independent from the others. You can see in this diagram that though Core 3 is loaded the entire time, both Core 2 and Core 0 are able to power down to practically 0 watts when their work load is complete. Turbo Mode: free performance? Perhaps the most interesting bit of news out of Intel's Nehalem was something called Turbo Mode - a feature directly enabled by the PCU we discussed on the previous page. With modern processors, the debate has raged whether users are better off getting a quad-core CPU at a lower frequency or a dual-core CPU at a higher frequency. Intel is hoping that with Turbo Mode users will get the best of both worlds.

The idea is pretty straight forward: if you have four cores that run at combined power consumption (and heat dissipation) of X, then if you only have two cores loaded (with the other two at idle) then you have additional power headroom to overclock the working cores to a higher frequency. For enthusiasts and gamers this should been an exciting turn of events. While Intel wasn't very specific at this point I imagine we'll see ranges of 200-300 MHz going from the full quad-core clock rate to the a dual-core or single-core (based on idle cores at the time. This means if you purchase a 3.2 GHz Core i7 Nehalem based processor, you will likely see clock rates as high as 3.5 GHz when running single threaded or just dual threaded applications. Gamers should also take note of this!

Intel claims that with the power of the PCU inside the chip the Nehalem core is aware of its surroundings and conditions. If your system is running very cool, say you have water cooling for example, the chip will recognize that it is well under its own TDP and push the clocks even faster. This is possible even while loading all four cores as the above diagram shows. The onboard micro-controller tunes voltages based around a given frequency, operating conditions and specific silicon characteristics. In some ways it appears that the Nehalem core will be able self award enough to find out how far it can be pushed without burning up.

The Intel Core i7 CPUs So now that we know that guts of everything that makes up the Intel Core i7 series of processors, what do the physical specimens themselves have to offer? As you might expect, from an exterior appearance the Nehalem-based CPUs are really quite plain and look remarkably similar to their older brother the Core 2:

Intel Core i7 965 on the left...or is it the right? The actual die is hidden by a heat spreader very similar to the ones we have become used on the Core and Core 2 series of processors and they do in fact use a very similar mounting mechanism. Rather than having pins on the CPU, the pins are located on the motherboard socket thus in theory preventing a lot of damage to the processors by end users during installation. This time around though the pin count is upped from 775 to 1366 making the new Nehalem Core i7 processor socket known as LGA1366. How very intuitive!

For a size comparison I have provided the above image: the two Core i7 CPUs rest on the bottom row while the Core 2 Extreme QX9770 LGA775 (left) and AMD Phenom X4 processor (right) take the high road. You can see that the new Intel Core i7 CPUs are indeed just larger versions of the Core 2 packaging with a couple of notches on the left and right hand sides to prevent improper directional installation in the CPU socket.

One more for good measure: the 1366-pin Nehalem-based designs on the bottom clearly have many more contacts than the 940-pin Phenom or 775-pin Core 2. One note about today's product announcement is that Intel does not anticipate having product for sale in the channel until sometime "in November" and today we are actually bringing you a sneak peek of the products as they will be available later in the month. The SKUs coming to market this year include:

Three CPUs will be available for purchase this month going from the Extreme Edition Core i7965 down to the Core i7-920. All three processors share a surprisingly high amount of specifications including memory speed, thermal dissipation and cache size. Of course all are built on the Intel Hi-K 45nm process technology. As we have seen in previous generations only the Core i7-965 Extreme Edition will have the overspeed protection removed to allow for a much more robust overclocking experience. The QPI speed on the EE CPU is also higher allowing for a faster connection to the north bridge and PCI Express 2.0 connection though performance advantages of this rate change are still a question in my mind. All three CPUs have a transistor count of 731M on a die size of about 263 mm^2 significantly smaller than the current beast of the processing world, the GT200 from NVIDIA

that sports 1.4 Billion transistors! Pricing is still just an estimate of course since they are not going to be for sale today, but Intel has set 1k pricing at the following:

Core i7-965 Extreme Edition: $999 Core i7-940: $562 Core i7-920: $284

Two things stand out to me about the pricing on these first three Core i7 CPUs. First I am pleasantly surprised to find that Intel would choose to offer such a low cost option for their first round of releases with the Core i7-920 selling for under $300. Intel probably could have gotten away with keep all three SKUs over the $500 mark for such a new technology but now we will see some mainstream PCs have the ability to adopt the new core architecture pretty quickly. The second point is that I seriously doubt we will see the Core i7-965 processor selling or $999 any time soon. An unfortunate trend we have seen increase with the Core 2 Extreme processors is that they take a LONG time to come down to their 1k pricing; take the Core 2 Extreme QX9650 as an example of a CPU that has yet to reach that $999 price tag after exactly one year on the market.

The latest CPU-Z at revision 1.48 is already setup to properly recognize and report on the specifications and speeds of the Intel Core i7 processors. It will become an invaluable piece of software as our tweaking and overclocking process begins.

Potrebbero piacerti anche