CASE

CASE STUDY INTEL GOES SMALL
Intel's Atom Architecture: The Journey Begins
The Atom processor's architecture is not about being the fastest, but being good enough for the
tasks at hand. A product like ASUS' EeePC would not have existed 5 years ago, the base level
of system performance simply wasn't great enough. These days, there's still a need for faster
systems but there's also room for systems that aren't pushing the envelope but are fast enough
for what they need to do.
The complexity of tasks like composing emails, web browsing and viewing documents is
increasing, but not at the rate that CPU performance is. The fact that our hardware is so greatly
outpacing the demands of some of our software leaves room for a new class of "good enough"
hardware. So far we've seen a few companies, such as ASUS, take advantage of this trend but
inevitably Intel would join the race.
One of my favorite movies as a kid was Back to the Future. I loved the first two movies, and
naturally as a kid into video games, cars and technology my favorite was the second movie. In
Back to the Future II our hero, Marty McFly, journeys to the future to stop his future son from
getting thrown in jail and ruining the family. While in the future he foolishly purchases a sports
almanac and attempts to take it back in time with him. The idea being that armed with
knowledge from the future, he could make better (in this case, more profitable) decisions in the
past.
I'll stop the analogy there because it ends up turning out horribly for Marty, but the last sentence
sums up Intel's approach with the Atom processor. Imagine if Intel could go back and remake
the original Pentium processor, with everything its engineers have learned in the past 15 years
and build it on a very small, very cool 45nm manufacturing process. We've spent the past two
decades worrying about building the fastest microprocessors, it turns out that now we're able to
build some very impressive fast enough microprocessors.
In 1993, it took a great deal of work for Intel to cram 3.1 million transistors onto a near 300
mm^2 die to make the original Pentium processor. These days, Intel manufacturers millions of
Core 2 Duo processors each made up of 410 million transistors (over 130 times the transistor
count of the original Pentium) in an area around 1/3 the size.
Intel isn't stopping with Core 2, Nehalem will offer even greater performance and push transistor
counts even further. By the end of the decade we'll be looking at over a billion transistors in
desktop microprocessors. What's interesting however isn't just what Intel can do to push the
envelope on the high end, but rather what Intel can now do with simpler designs on the low end.
With a 294 mm^2 die size, Intel could not manufacture the
original Pentium for use in low cost devices however, today things are a bit different. Intel
doesn't manufacture chips on a gigantic 0.80µm process, we're at the beginnings of a transition
to 45nm. If left unchanged, Intel could make the original Pentium on its latest 45nm process with
a die size of less than 3 mm^2. Things get even more interesting if you consider that Intel has
learned quite a bit in the past 15 years since the debut of the original Pentium. Imagine what it
could do with a relatively simple x86 architecture now.
It Does Multiple Threads Though

Despite being 2-issue, it's not always easy to execute two instructions from a single thread in
parallel due to data dependencies between the two. Intel's solution to this problem was to
enable SMT (Simultaneous Multi-Threading) on Atom (not all models unfortunately) to allow the
concurrent execution of up to two threads. Welcome the return of Hyper Threading.
Remember the rule of thumb for power/performance tradeoffs? Intel's decision to enable SMT
on Atom was the perfect example of just that. SMT increased power consumption by less than
20% on Atom, however it also yielded a 30 - 50% increase in performance on the in-order core.
The decision couldn't be easier.
The Atom has a 32-entry instruction scheduling queue, but when running with SMT enabled
each thread has its own 16-entry queue. The scheduler doesn't have to switch between threads
each clock, it can do so intelligently, the only limitation is that it can only dispatch 2 ops per
clock (since it is a 2-wide machine). If one thread is waiting on data to complete an instruction,
on the next clock tick the scheduler can choose to dispatch an op from a separate thread that
will hopefully be able to execute.
Making Atom multithreaded made perfect sense from a logical standpoint. The downside to an
in-order core is that if there is an instruction that is waiting on data to begin execution the rest of
the pipeline stalls while that dependency is resolved. The chances that you'll have two
independent instructions from two independent threads both with misses in cache is highly
unlikely.
Execution Units
Atom isn't a superwide processor, with an in-order front end and no on-die memory controller it's
unlikely that we'll see tremendous instruction throughput. Data dependencies would do a good
job of ensuring that tons of execution units remain idle, so Atom's designers did their best to
only include the bare minimum when it came to execution units.
There's no dedicated integer multiplier or divider, these functions are shared with the SIMD FP
units. There are two SSE units and the scheduler can dispatch either a float or an integer SIMD
op to both ports in a given clock.
All of the functional units are 64-bits wide with the exception of supporting full width SIMD
integer and single precision FP ADDs.

CASE

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CASE

Caricato da

Copyright:

Formati disponibili

CASE STUDY INTEL GOES SMALL

Intel's Atom Architecture: The Journey Begins

It Does Multiple Threads Though

Potrebbero piacerti anche