Sei sulla pagina 1di 38

Difference Between CPU and MicroProcessor

Categorized under Hardware,Technology | Difference Between CPU and MicroProcessor

The term central processing unit or CPU was developed a long time ago as a term used to identify the portion of the machine that did the actual processing. This term was coined long before the presence of microprocessors and integrated circuits. As technology developed from one form to another, the CPU started to shrink in size. Older CPUs consisted of large vacuum tubes wired together that took up huge spaces, then the appearance of discrete transistors reduced the size of the CPU. The CPU was further miniaturized with the advent of integrated circuits and microprocessor. The once extremely large and cumbersome CPU was reduced to a very minute piece of silicon with all the connections etched into it already. A microprocessor is a very advanced integrated circuit that houses millions of transistor within a single package. Along with the transistors within is the circuitry that allows the microprocessor to function and requires little else. The microprocessor was so advanced that it immediately wiped out any other forms of computing. It has managed to contain the CPU, at first in a couple of microprocessors, then finally into a single microprocessor. It has managed to include a few components along the way like a little bit of memory that we now call as the cache. It is then understandable why the microprocessor and CPU have become interchangeable. The technology of the microprocessor has become so advanced that it has the ability to contain not just one but up to four CPUs inside it, as in the case of quad core microprocessors. And that is not even the limit of what a microprocessor can do. To put it in perspective, given todays technology. All CPUs are microprocessors, but not all microprocessors are CPUs. The use of the microprocessor has become so widespread that in a single computer system, there now a number of microprocessors working and they have all but replaced the transistors that were once the king of computer components. The GPU (Graphics Processing Unit) is also contained in a microprocessor. Even the northbridge and southbridge of the computer are both in microprocessors. To sum up this whole article, the CPU is the brain of a whole computer system. This is where the entire decision making process happens. All the other parts of the computer just obey the requests of the CPU. The microprocessor is an advancement in transistor technologies that allow multiple transistors to be placed in a certain package. It is so advanced and economical that it has become advantageous for manufacturers to utilize the microprocessor in almost every part of the computer.

Read more: Difference Between CPU and MicroProcessor | Difference Between | CPU vs MicroProcessor http://www.differencebetween.net/technology/difference-between-cpu-andmicroprocessor/#ixzz1eWIOyPcV A processor is the logic circuitry that responds to and processes the basic instructions that drive a computer. The term processor has generally replaced the term central processing unit (CPU). The processor in a personal computer or embedded in small devices is often called a microprocessor.

Transmeta Crusoe Processor for Embedded Applications


Transmeta, the leader in effi cient computing, offers a line of low power, high-performance processors designed to meet the unique requirements of embedded applications. The Transmeta Crusoe is an energy effi cient processor built upon innovative technology that provides embedded devices a performance per watt ratio that is unmatched by any other x86-based processor in its class. Available in a variety of low power versions, the Transmeta Crusoe processor is ideal for applications that require high performance processing within small and thermally constrained environments. Its inherently energy effi cient design allows gigahertz processor speeds without the need for active cooling and external CPU fans. Integrated power management technology further enhances effi ciency by dynamically scaling both processor frequency and voltage according to the instantaneous demands of the computer system. Transmeta Crusoe processors provide full x86-compatible software execution and seamless operation with all standard x86compatible operating systems including Microsoft Windows, Linux, and a variety of real time operating systems (RTOS) from companies including LynuxWorks, MontaVista, QNX, Red Hat, and Wind River. Transmeta works closely with partners, customers and commercial laboratories to ensure validated ineroperability and continued adherence to high quality and reliability standards. High Performance with Low Heat Dissipation - A family of energy effi cient processors for every performance/thermal requirement Highest System Quality and Reliability - All CrusoeSE processors are rated for 24/7, 10yr operating life - Fan-less designs enhance system reliability High Integration for Small Form Factor designs - Integrated northbridge functionality reduces board real estate Transmeta stands committed to Embedded Product Lifecycles - Extended Product Availability - Comprehensive Engineering and Marketing support A Special Embedded version of the Transmeta Crusoe processor the Transmeta Crusoe SE processor enables embedded designs that require superior reliability. To support a wide range of embedded applications, processors are rated to run at full speed over the entire operating temperature range of 0C to 100C twenty four hours a day, seven days a week. Product life is rated to exceed 10 years while running at these performance and environmental extremes. Transmeta Crusoe and Crusoe SE processors are designed for embedded applications in the areas of offi ce automation, networking/communications, storage, server-based computing, science and medicine, transportation, automotive/telematics,

and industrial automation. Some example devices in these markets include: thin clients, blade servers, printers and copiers, point-of-sale, smart displays, hand held and portable consumer devices, ultra-personal computers, set top boxes and many other applications.
TM

667MHz 128KByte L1 Cache (64KByte L1 cache and 64KByte L1 D-cache) 256KB L2 write-back cache Integrated Northbridge - 64-bit, 133MHz DDR memory controller - 64-bit, 133MHz SDR memory controller - 32-bit, 33MHz, 3.3V PCI bus MMX Instruction Support 0.13m process Compact 474-pin Ceramic BGA Package Max TDP: 5.1W (includes Northbridge power) Transmeta Crusoe Processor Model TM5500 800MHz - 1GHz 128KByte L1 Cache (64KByte L1 I-cache and 64KByte L1 D-cache) 512KB L2 write-back cache Integrated Northbridge - 64-bit, 133MHz DDR memory controller - 64-bit, 133MHz SDR memory controller - 32-bit, 33MHz, 3.3V PCI bus MMX Instruction Support 0.13m process Compact 474-pin Ceramic BGA Package Max TDP: 6.8 - 9.0W (includes Northbridge power) Transmeta Crusoe Processor Model TM5800 667MHz 128KByte L1 Cache (64KByte L1 I-cache and 64KByte L1 D-cache) 256KB L2 write-back cache Integrated Northbridge - 64-bit, 133MHz DDR memory controller - 64-bit, 133MHz SDR memory controller - 32-bit, 33MHz, 3.3V PCI bus MMX Instruction Support 0.13m process Compact 474-pin Ceramic BGA Package Max TDP: 5.1W and 6.2W (includes Northbridge power) Supports T-junction temperatures of 100C Rated for 24/7 operation for 10 years Transmeta Crusoe SE Processor Model TM55E 800MHz & 933MHz 128KByte L1 Cache (64KByte L1 I-cache and 64KByte L1 D-cache) 512KB L2 write-back cache Integrated Northbridge - 64-bit, 133MHz DDR memory controller - 64-bit, 133MHz SDR memory controller - 32-bit, 33MHz, 3.3V PCI bus MMX Instruction Support 0.13m process Compact 474-pin Ceramic BGA Package Max TDP: 6.8W-9.0W (includes Northbridge power) Supports T-junction temperatures of 100C Rated for 24/7 operation for 10 years Transmeta Crusoe SE Processor Model TM58E
2003 Transmeta Corporation. All rights reserved. Information in this document is provided in connection with Transmeta Products. No license, express or implied, or otherwise to any intellectual property rights are granted by this document. Except as provided in Transmetas Terms and Conditions of Sale for such products Transmeta assumes no liability whatsoever including liability, warranties, infringement of any patent, copyright or other intellectual property right.

For more information, visit www.transmeta.com UNITED STATES


Transmeta Corporation 3990 Freedom Circle Santa Clara, CA 95054 USA

JAPAN

Transmeta Japan KDDI Bldg Annex 3F S2-3-3 Nishi-Shinjuku Shinjuku-ku Tokyo 160-0023 Japan

ASIA-PACIFIC

Transmeta Taiwan 7F-1, No.167, Fu-Hsing North Road Taipei, Taiwan R.O.C. 105

EUROPE

Transmeta Europe 9 Eglinton Road

Bray County Wicklow Ireland


TM

Transmeta Crusoe Processor Core


At the heart of the Transmeta Crusoe processor lays a very streamlined, effi cient 128-bit VLIW (Very Long Instruction Word) hardware engine. Surrounding that heart is the Code Morphing Software (CMS), a software engine that works in tandem with the VLIW hardware engine to morph and execute x86 instructions in native VLIW code. This innovative approach has led to a number of compelling advantages, the largest of which is the reduction in the number of power hungry logic transistors. This streamlining of processor design allows Transmeta to greatly improve performancetopower consumption while allowing heat dissipation to be kept to a minimum.

Code Morphing Software (CMS)

CMSthe software component of the Transmeta Crusoe processortranslates x86 instructions into highly optimized and extremely fast VLIW native instructions which are then processed with great effi ciency. These translations are stored and reused in subsequent execution, further enhancing performance over standard x86 architectures.

Integrated Northbridge Controller

Transmeta further reduces electrical consumption and thermal requirements within the system by integrating Northbridge controller functionality directly onto the processor core. This functionalityconsisting of SDR and DDR DRAM memory controllers, a serial ROM interface, and a PCI bus interfaceeases system design, reduces board space, and enhances performance. As a separate chip, a Northbridge chipset consumes 2 3 watts of additional power whereas the Transmeta Crusoe processor consumes just a fraction of that.

Transmeta LongRun Power Management

Transmeta LongRun is a power management technology that further reduces thermal constraints by dynamically adjusting the operating voltage and clock frequency of the processor core based on application demands. By evaluating the demand on the processor, LongRun delivers just enough performance to satisfy the workload at hand. This conserves power and improves battery-life. If desired, LongRun can be confi gured to deliver different performance characteristics depending on the application, making it possible for designers to build smaller enclosures than were previously possible. Best of all, Transmeta LongRun technology provides more responsiveness than conventional power management schemes used by operating systems and is completely transparent to the end-user.

Crusoe Special Embedded (SE) processors SANTA CLARA, Calif.- Transmeta Corporation (Nasdaq:TMTA) today announced new energy efficient Crusoe(TM) Special Embedded (SE) processors targeting a wide range of x86 embedded applications, including industrial automation, scientific instrumentation, retail kiosks, point-of-sale terminals, automotive infotainment, process control and home automation systems. Transmeta also announced an Embedded Partners Program for leading BIOS/firmware companies, embedded operating system companies and silicon

component suppliers committed to supporting customers developing efficient embedded designs based on Crusoe SE processors. Crusoe SE processors meet the growing requirement for combining x86-compatibility, high performance, power efficiency, low heat dissipation and chipset integration to create compact, passively cooled embedded systems. Crusoe SE processors use LongRun, an advanced power management technology, to dynamically optimize processor frequency and voltage while monitoring chip temperature, keeping power use and heat at minimum levels. Embedded system developers can now take advantage of almost 1GHz of x86-compatible performance, with an integrated Northbridge, in a small package that does not require a cooling fan. The Crusoe SE product family is an exceptional solution for developers of today's high performance, x86 embedded systems and an excellent performance upgrade path for systems developed using lower-end SOC embedded technology.

Crusoe SE processors are designed and tested for long-term use in harsh environments, where a chip's temperature can reach as high as 100(degrees) C. Future versions of Crusoe SE processors will be available for extreme embedded applications that require even wider temperature ranges. "The introduction of Crusoe SE processors for x86 embedded applications complements Transmeta's existing mobile business and meets two important company goals," said Dr. Matthew R. Perry, president and CEO, Transmeta Corporation. "We are expanding target markets for Crusoe processors to new growth segments and geographic regions." Crusoe SE processor specifications include: -- Crusoe SE parts available at 667 MHz, 800 MHz and 933 MHz, optimized for x86 embedded applications. -- Each processor available in a standard or low power version. -- Reduced operating temperatures enabling fanless system designs to minimize end

product reliability challenges associated with fan-cooled systems. -- LongRun power and thermal management maximizes embedded system performance while reducing power consumption and heat dissipation. -- Upgraded Code Morphing Software (CMS) enhanced to maximize real-time performance while maintaining complete x86-compatibility. -- Integrated Northbridge reduces board space use allowing compact designs. -- Reliable operation twenty-four hours a day, seven days a week at full-rated speed and temperature to meet the high reliability requirements of mission critical embedded designs. -- Extended Crusoe SE availability program to support long-term embedded product life cycles. -- Crusoe SE processors immediately available to customers. -- Pricing example: 667 MHz Crusoe SE less than $50 per unit in volume. Crusoe SE TM5500EXTM5500ELTM5800EXTM5800ELTM5800ELProcessor 667 667 800 800 933 Frequency 667 MHz 667 MHz 800 MHz 800 MHz 933 MHz Range Voltage Level Power Level (Maximum)* L1 Cache L2 Cache Main Memory North Bridge Package Sample Production 0.9-1.2V 6.2W 128KB 256KB DDRAMSDRAM (100 to 133MHz) Integrated 474 BGA Now Now 0.9-1.3V 5.1W 128KB 256KB DDRAMSDRAM (100 to 133MHz) Integrated 474 BGA Now Now 0.9-1.3V 8.0W 128KB 512KB DDRAMSDRAM (100 to 133MHz) Integrated 474 BGA Now Now 0.9-1.3V 6.8W 128KB 512KB DDRAMSDRAM (100 to 133MHz) Integrated 474 BGA Now Now 0.8-1.3V 9.0W 128KB 512KB DDRAMSDRAM (100 to 133MHz) Integrated 474 BGA Now Now

Transmeta's Embedded Partners Program Embedded system developers face challenging business and technical requirements, as product operation and reliability specifications are complex and life cycles can last for many years. In addition to a dependable, long-term supply of strategic components such as Crusoe SE processors, these customers require a reliable structure of complementary companies to support a multitude of embedded system needs.

To that end, Transmeta is working closely with many partners to provide essential support services and products that benefit customers in the development of embedded applications. These include BIOS/firmware companies, embedded operating system companies and silicon component suppliers. Through these partnerships, customers have available a wide platform of hardware and software that are pre-tested for use with Crusoe SE processors for easy development and fast time-to-market. Transmeta is preparing a comprehensive section of its Web site to serve the needs of embedded customers. The section is scheduled to be live later this week at www.transmeta.com. BIOS/Firmware Companies Crusoe SE processors' x86 compatibility means easy integration with the already established base of x86 BIOS software. Transmeta is collaborating with several leading embedded BIOS companies to offer attractive feature sets that are optimized for embedded applications, in addition to those in the traditional PC market. Transmeta's BIOS company partners include: General Software, Inc. General Software provides superior enabling firmware and world-class support for OEM manufacturers of telecommunications, data communications, consumer electronics, dedicated servers and other specialized computing devices. "General Software's support for the Transmeta's Crusoe embedded processors means breakthrough technology for the x86 embedded market and an attractive solution for Transmeta customers," stated Dick Dorton, business development manager, General Software, Inc.

Insyde Software Insyde Software offers a full line of embedded ROM firmware, Windows CE, Microsoft XP Embedded and Linux enabling software, power management solutions, USB support, and system integration services. "Insyde Software is pleased to be working with Transmeta to provide a complete hardware and software enabling solution in support of the newly announced Crusoe SE processors," said Stephen Gentile, vice president of business development and marketing, Insyde Software. "By leveraging our respective embedded design expertise, we can deliver a high performance, differentiated solution to consumer electronics, information appliance, and thin-client/POS manufacturers." Phoenix Technologies Phoenix Technologies delivers an industry-leading BIOS solution for mobile applications that has often been chosen by Transmeta's customers for notebooks and other power sensitive systems. As Transmeta launches its new Crusoe SE processors for embedded markets, Phoenix and Transmeta strengthen their commitment to deliver world-class solutions by combining their respective technologies across a wide range of customer products. In addition to its traditional BIOS product, Phoenix is offering a new coremanaged environment, FirstView Connect, which enables delivery of open Internetstandards content via set-top boxes, Web terminals, information kiosks, portable video recorders (PVRs), enhanced DVD players and media players. "With FirstView Connect, Phoenix provides instant Internet connectivity at the core of devices that enable new forms of entertainment, e-commerce and information on demand," said Bob Gager, senior director of product marketing for Phoenix. "The new Crusoe SE processors are a perfect fit for our software solution, offering an unmatched combination of performance, cool-operation and energy efficiency to this exciting category." Embedded Operating System Companies

The complete x86 compatibility and reduced power requirements of Crusoe SE processors enable embedded operating system companies to expand sales into new areas of the market. Transmeta has established close working relationships with market leaders across the embedded operating system space, offering easy out-of-the-box bring-up of multiple offerings. These partners include:

LynuxWorks LynuxWorks is a leader in the embedded software market, providing operating systems, software development products and consulting services for the world's most successful communications, aerospace/defense, and consumer products companies. "Transmeta's Crusoe processors provide a very unique solution to the historical challenge of achieving high processor performance within a useable, embedded power budget," said Art Lee, director of business development, LynuxWorks. "Combining this high performance, power efficient technology with our choice of Linux compatible, embedded operating systems, creates an ideal platform for a wide range of end market applications." Microsoft Corporation Microsoft Corporation certified Transmeta's existing Crusoe processors for the Windows CE .NET operating system, allowing new opportunities for Transmeta in mobile and embedded markets. Windows CE .NET is Microsoft's modular, small footprint, embedded operating system for a variety of computing devices such as PDAs, digital cameras, printers, scanners, retail point-of-sale terminals and set-top boxes, among many other intelligent computer products. Transmeta's new energy efficient, high performance, x86compatible Crusoe SE processors are also designed to take full advantage of Microsoft's Window's XP Embedded operating system. The combination of the powerful Microsoft Windows XP Embedded operating system and high performance Crusoe SE processors allows developers to rapidly design and deploy complex solutions for highly reliable, fullfeatured embedded applications.

MontaVista Software, Inc. Montavista Software Inc. provides leading solutions for embedded applications, including MontaVista Linux Professional Edition, named "Best Embedded Linux Solution" at the New York Linux World and Expo. This offering provides Transmeta's customers with a sophisticated, off-the-shelf solution including a rich tool chain, extensive library support, and state of the art networking capabilities. The collaboration between Transmeta and MontaVista Software Inc. will enable customers of both companies to bring Linux-based consumer electronics products to market more quickly and cost-effectively. "Montavista is committed to help our customers deliver compelling consumer electronics products," said Bill Weinberg, director of strategic marketing, MontaVista Software Inc. "The combination of Montavista's Linux solutions and Transmeta's energy efficient Crusoe embedded processors offer impressive capabilities to the market." QNX Software Systems QNX Software Systems is an industry leader in realtime, micro kernel operating system technology. The inherently reliable and scalable QNX(R) Neutrino(R)RTOS and powerful QNX(R) Momentics(R) development suite together provide the most trusted foundation for embedded systems in the networking, automotive, medical and industrial markets. "The QNX history of ultra reliability, combined with Transmeta Crusoe's x86 compatibility and efficient power management, is a powerful combination for OEMs building nextgeneration embedded devices," said Linda Campbell, director of strategic alliances, QNX Software Systems. Red Hat Red Hat is recognized as a worldwide market share leader in delivering compelling enterprise-grade Linux solutions. Red Hat develops and supports the Red Hat Linux Advanced Server and Red Hat Linux Advanced Workstation operating systems for deployment in a wide range of markets. For example, Red Hat Linux helps to enable supercomputer-level performance in the Los Alamos National Laboratory's 480-node "Green Machine" compute cluster based on Transmeta's Crusoe processors. With the rollout of Transmeta's Crusoe SE processors, system designers can now more readily leverage the power of Red Hat's enterprise-grade operating systems with the attractive features of these new processors. Wind River Systems, Inc. Wind River Systems, Inc. is the worldwide leader in embedded software and services. It is the only company to provide market-specific embedded platforms that integrate realtime operating systems, development tools and technologies. WIND RIVER(R) PLATFORMS are new, standardized integrated platforms that provide a complete foundation to meet the specific requirements of vertical markets. A VxWorks(R) Board Support Package (BSP) for the Crusoe TM5800 System Development Kit is available from Transmeta's Web. "Combining Wind River's market-specific integrated embedded platforms with Crusoe's high performance and power efficient operation offers our customers new levels of capability and portability in multimedia devices, industrial control applications, and wireless-networked devices," said Caroline Yao, director of partner solutions for Wind River. Silicon Component Companies Silicon component companies are developing new graphics, networking and peripheral components aimed at taking advantage of the reduced power, low-temperature and high

performance requirements of the x86 compatible embedded processor market. These Transmeta partners include: ALi Corporation ALi Corporation (ALi) is one of the world's leading suppliers of integrated circuits for personal computers and embedded systems. The ALI M1535-family is the most widely chosen southbridge by customers of Transmeta's Crusoe processor to date. The M1535+ generation southbridge shipping in mass production today delivers rich ACPI-compliant power management support, fast I/O using the industry-standard PCI bus, as well as a richly integrated solution that incorporates an advanced full-function Super I/O on-chip, an AC-Link Host controller, two channels of IDE, Fast IR and many other interfaces. "ALi and Transmeta have long collaborated to deliver attractive notebook solutions to the market and with the launch of Transmeta's Crusoe SE processors, ALi is eager to continue this relationship by offering cutting edge southbridge solutions for embedded customers," said Dr. Chin Wu, president of ALi Corp. "Platform stability is a critical concern in the embedded space, and ALi joins Transmeta's commitment to deliver the M1535+ solution to embedded customers over an extended lifetime." Silicon Motion, Inc. Silicon Motion Inc. is a leading provider of high-performance, low-power multimedia accelerators that enable a rich experience on mobile access devices. Combining Crusoe SE processors with Silicon Motion's multimedia accelerator chips provides an ideal solution for embedded mobile market segments such as smart displays, wireless broadband terminals, car information systems, ATMs, point-of-sale systems and kiosks. "Transmeta and Silicon Motion share the goal of continued expansion and development of energy efficient solutions for the embedded market," said Wallace Kou, president and CEO, Silicon Motion. "Together, our solution provides a compelling platform for the next embedded generation."

Transmeta's Embedded Customers Transmeta's customers provide innovative, embedded solutions based on the existing family of energy efficient Crusoe processors and several companies have already committed to designs based on the new Crusoe SE processors. Examples of current and new embedded designs include: Advantech Advantech is one of the world's largest suppliers of industrial PCs and manufactures PC platforms for communications, industrial automation and embedded computing. Transmeta and Advantech have collaborated on several embedded products, including a 3.5-inch, single board computer (SBC) based on the Crusoe TM5400 processor. The PCM-9370 fanless SBC simplifies the configuration and installation process because the processor is mounted directly on board. Transmeta's LongRun power management technology allows this product to consume very little power and eliminates the need for a processor fan. Evalue Technology, Inc. Evalue Technology, Inc. is an innovator of applied computing solutions, whose mission is to become a leading embedded & IA solution provider. Evalue's SOM-144 (System-OnModule) product line is a series of complete CPU system modules that can be mounted onto any easily designed application-specific carrier boards. These ready-to-use SOM modules integrate all aspects of PC functionality with a variety of platforms, all within a very small 68 mm x 100 mm area. The ESM-2615 is an ultra small SOM-144 CPU module integrated with most PC functionality and includes an energy efficient, Transmeta Crusoe TM5400 processor. Gespac

Gespac, a leading European manufacturer of standard form factor and custom embedded boards and systems, will develop a number of embedded products based on Transmeta's new Crusoe SE processor line. The Swiss-based company, which is focused on leading edge embedded systems for the automation, telecommunications, medical and transportation market, will initially develop a series of products aimed at the transportation industry. These Crusoe SE-based products will ship in early 2003. "The fact that Transmeta's Crusoe embedded processors give our system designers significant x86 processing power with improved reliability due to the elimination of a cooling fan, makes it a natural fit for train-based applications where system up-time is key," said Vincent Gachet, projects manager, Gespac. IBASE IBASE, a leader in standard form factor embedded boards, currently offers a wide variety of Crusoe-based boards in a number of form factors. The Taiwan-based company will be expanding its product line of Crusoe-based CPU board products to include the new Crusoe SE processors. IBASE target applications include networking, automation, miniserver, firewall, medical, military and point-of-sale devices. "The new Crusoe SE processor line gives us the long term product availability that many of our embedded system customers demand," said Ben Liao, vice president of sales, IBASE. "The reduced power requirements coupled with passive cooling enables us to design x-86 compatible, high performance embedded systems with low cost enclosures." ICP America, Inc. ICP America offers a comprehensive line of industrial computer products designed to meet the demanding requirements of today's industrial applications, including factory control and automation, computer telephony, medical instrumentation, and mobile computer systems. The "Suppliers of Innovative Industrial Computer Products," ICP America's extensive product line, includes single board computers, chassis, backplanes, LCD workstations, IDE flash drives and industrial power supplies. The company offers the Crusoe TM5400-based Wafer-6820, a 3.5" single board computer, with fanless operation due to the energy efficient qualities of Crusoe processor technology.

TransDominion Technologies TransDominion Technologies, which offers a line of Crusoe-based firewall appliances, is a leader in the security appliance industry, specializing in the design, integration, production, and support of customized embedded solutions. The company has partnered with some of the premier vendors in the security industry to help develop embedded appliances based on leading software architectures. "For the TrueGate TG-23 and TG-27, we needed a processor that could provide high performance and enable an efficient design, a perfect fit for Transmeta's Crusoe processor," said Nate Carmody, chief technology officer, TransDominion Technologies. "The Crusoe processor exceeded our expectations and provided reduced power consumption, making it the best solution for our needs." TransLink USA TransLink USA is an embedded board and systems company located in Plano, Texas. TransLink is focused on leveraging Transmeta's power-efficient, high performance Crusoe processor technology to provide superior solutions in both standard and proprietary form factors for a number of key industries in the embedded market segment. TransLink is developing board solutions for a number of embedded market segments including industrial control, communications, gaming, vending, point-of-sale, kiosk and data security. Crusoe SE processors will allow TransLink to provide previously unavailable single board computer solutions for the embedded market. Products based on the new Transmeta Crusoe SE product line will be in volume production starting in Q1 2003. "TransLink is excited about the opportunity to partner with Transmeta to provide flexible, price competitive products with superior performance and long life cycles to the embedded market," said Rob Orsini, vice president of sales and marketing, TransLink USA. Tri-M Systems, Inc.

Tri-M Systems and Engineering of Port Coquitlam, BC, Canada provides hardware and turnkey solutions for embedded systems, specializing in PC/104 products. For the past 20 years, the company has been both a manufacturer and distributor specializing in the PC/104 field. Tri-M Systems is introducing the TMZ104, a breakthrough x86, single board computer in PC/104 format featuring the energy efficient, Transmeta Crusoe SE processor. With a compact size of 3.55" x 3.77" and off-the-shelf extended temperature rating, this board will be of interest to customers looking for embedded x86 technology in hostile, mobile, industrial, military, medical, or telecom environments where minimal power consumption, small board size and fanless operation are key design factors. About Transmeta Corporation Transmeta develops and sells software-based microprocessors and develops additional hardware and software technologies that enable computer manufacturers to build computers that simultaneously offer long battery life, high performance and x86 compatibility. Transmeta's family of Crusoe microprocessors is targeted at the notebook, Tablet PC and Internet appliance segments of the mobile Internet computer market, as well as a range of embedded applications. To learn more about Transmeta visit www.transmeta.com. Transmeta, Crusoe, LongRun and Code Morphing Software are trademarks of Transmeta Corporation. All other product or service names mentioned herein are the trademarks of their respective owners.

Transmeta Corporation (Nasdaq:TMTA) has announced that Tsinghua Unisplendour Group, China's second largest notebook supplier, has selected the energy efficient Crusoe TM5800 processor for a new notebook in the Chinese market. Source: http://www.allbusiness.com/technology/computer-hardware/4404801.html#ixzz1eWSSNeQa The C3 VIA and Transmeta Crusoe were two supporting during the fight between the Athlonand the Pentium III and 4 from Intel. Although sales have been weak, they were the focus of great curiosity. The C3 had a troubled history, changing its name several times (Jedi, Gobi, Cayenne, Joshua, Samuel ...) and taking samples produced with various configurations of cache and architectural changes, until it was finally released in 2001: He was a successor to 6 86 and MII, produced by VIA on the basis of previous projects of Cyrix. It it was a low-power processor, which had 128 KB of L1 cache and 64 KB of L2 cache (similar to the Duron), which was produced in a hybrid technique of production, with some components being produced using a technique of 0.15 and other a technique of 0.13 micron. Being a manufacturer of chipsets, VIA had licenses for the GTL + bus, which allowed the C3 would be compatible with Socket 370 boards for the Pentium III and Celeron. At the time Intel was making the transition to the Pentium 4 and there was a large supply of Socket 370 boards of low cost, which helped sales. The C3 architecture offered a fairly simple, with only two execution units (such as Pentium 1) and a very weak arithmetic coprocessor, which made him a passable choice for office

applications (the performance was slightly lower than a Celeron of the same clock ) but quite inadequate for gaming. Moreover, the simplicity of the processor has a fairly cheap to produce. Even with 128 KB L1 cache, it occupied an area of only 55 mm , which allowed VIA to produce almost twice as many processors per wafer that Intel could get with the Celeron Coppermine, for example.This allowed the VIA processor offered at low prices, enabling the emergence of a large number of PCs and low-cost notebooks, which (despite the poor performance) had some success at the time. The C3 was produced in three versions. All use the same configuration of caches (128 KB L1 and 64 KB of L2, but differ in the manufacturing technique and architecture. The first was based on Samuel 2 core, 0.15 micron and was produced in small quantities in versions from 667 to 800 MHz The following was the Erza, which pioneered the use of hybrid technique of 0.15 micron and 0.13 and was produced in versions 800 MHz to 1.0 GHz The third was Nehemiah, who held the same fabrication technique, but adopted the use of a longer pipeline (16 stages against the earlier 12), which allowed it to be released in versions 1.0 to 1.4 GHz Although outdated, the C3 is still sold in small quantities in the following years, serving as a kind of waste to manufacturers interested in selling low-cost PCs and low-performing, a position similar to that currently occupied by Atom. With the end of the socket 370 platform, the VIA has to focus on the production of the EPIA platform, a line of miniature cards, which combined C3 processor (and possibly C7) chipsets themselves. However, the low performance processors, combined with VIA's difficulties in selling the cards at competitive prices have made it never made much success, despite the technical merits.Here we have an EPIA SP8000E, with a C3 800 MHz: The Crusoe in turn was a project far more exotic and ambitious, who adopted the use of a radically different architecture, which was an attempt to solve the problem of load legacy of x86 chips, without compromising compatibility with existing software . Unlike the other current processors, which use instruction decoders and computers to sort and convert x86 instructions (running a good deal of processing before they reach the execution units), the Crusoe used a simplified design, where the chip processing a own set of instructions, consisting only of simple instructions, as in a RISC processor. The compatibility with the x86 instruction set was obtained through a translation software, called "Code Morphing Software, which was supposed to convert x86 instructions sent by the programs in simple instructions understood by the processor, arrange them in order to be executed faster and coordinate the use of registers, tasks that in other processors are implemented via hardware. The Code Morphing software was stored in a small amount of ROM integrated processor. By connecting the PC, it was the first thing to be born (even before the BIOS) and was residing

in a protected area of RAM, running as an intermediary between the physical processor and operating system. All instructions translated by the Code Morphing Software were stored in a special cache, called translation cache. This cache occupying part of the L1 and L2 caches of the processor and also an area of RAM that can vary in size according to the volume of different instructions processed. He avoided the processor to waste time translating the same block of statements repeatedly, but in return consume part of their own caches and RAM, again reducing the amount of resources devoted to processing itself. Internally, the Crusoe chip was a 128-bit VLIW, which processed the x86 instructions into groups of 4 instructions of 32 bits, in a design similar to that used in the implementation of SSE, but applied to the processing of all instructions. The symbol "VLIW" stands for "Very Long Instruction Word" and rightly emphasizes the use of execution units wide, capable of processing many bits at a time. Thanks to the combination of two factors, the Crusoe chip was a much simpler and energy efficient. The TM5420 600 MHz, for example, consuming less than 2 watts operating at fullload, less than a 486. The big problem is that the Code Morphing Software consumed much of the processor resources, leaving fewer resources for processing instructions. This meant that the Crusoe was very slow compared to an Athlon or Pentium III to the clock, which reduced the demand for the processor as to derail the project. The Crusoe existed in versions of 500 MHz to 1.0 GHz All offer a very low power consumption, which caused it to be used in some ultra-compact notebooks and also Desknote manufactured by ECS. However, the performance was 50% lower than a Pentium III to the clock, which made the notebooks based on it slow. In 2004, Transmeta Efficeon released, an updated version of the processor, which was based on a 256-bit architecture (ie, processed 8 groups of 32-bit instructions) and offered a performance cycle by almost two times higher than the original version. The Efficeon was released in versions up to 1.7 GHz and provided a much more competitive performance, but he eventually came too late to save the Transmeta, which ceased production of the processors in 2005. More posts 1. What does Transmeta do? Transmeta creates, markets and sells the Crusoe processor, a family of software based, smart microprocessor solutions. Crusoe processors are specifically designed to combine PC software compatibility with high performance and extremely long battery life. The Crusoe processor solutions are the only ones designed to span the complete range of ultra-light (less than four pounds) mobile PCs and Internet devices. Back to top 2. What are the details behind the smart microprocessor architecture? The smart microprocessor architecture, as used in the Crusoe Processor, relies on software to perform a carefully selected set of functions that are performed in today's hardware-based

processors. This repartitioning of functionality allows for a great deal of flexibility in offering solutions that are more tailored to specific market segments The Crusoe processor takes advantage of a key benefit in this repartitioning: the significant reduction in the number of transistors needed to perform a task. This reduction results in a power consumption as low as 10 to 20 milliwatts while users run everyday PC applications like email and Internet browsing. For heavy-duty multimedia applications like DVD, the processor typically consumes just 1 - 2 watts. It also leads to a very small die size that is economical to build. The smart microprocessor consists of a hardware VLIW core as its engine and a software layer called Code Morphing software. The Code Morphing software acts as a "shell" that surrounds the VLIW core but resides beneath the operating system "morphing" or translating x86 instructions to native Crusoe instructions. In addition, the Code Morphing software contains a dynamic compiler and code optimizer to search out blocks of software that make up the repetitive sequences commonly found in applications and reduces them to a smaller set of executable instructions. The result is increased performance at the least amount of power.
TM

The final benefit offered by the smart microprocessor architecture is that it allows Transmeta to evolve the VLIW hardware and Code Morphing software separately without affecting the huge base of software applications. Upgrades to the software portion of a microprocessor can be rolled out independently from chip revisions. Likewise, decoupling the hardware design from the system and application software frees hardware designers to evolve (or eventually replace) their designs without perturbing the legacy software base. Back to top 3. How is the Crusoe processor different from today's mobile processors from Intel and AMD? Today's Intel and AMD mobile processors are really desktop processors that have been derated for the mobile market and as such, they represent the culmination of several generations of increasingly burdensome hardware complexity. While these processors have been the driving force behind desktop computing since the 1970s, they have shown their limitations in mobile computers as they become smaller and smaller and have had to make tradeoffs between performance, excessive heat, and battery life. Transmeta believes, as do a number of industry experts, that a new architectural approach is needed in order for the mobile computing market to reach its full potential. One such expert, John Hennessey, a professor of electrical engineering and computer science at Stanford University, confirmed this trend when he said, "Microprocessor designers need to adopt fresh techniques and new kinds of metrics to align their work with the coming "post-desktop era." He continued, "Requirements for compact, low-power, highly reliable embedded devices and techniques... will drive the next generation of processor designs." The Crusoe smart microprocessor architecture implements a carefully selected set of functions in software, as opposed to hardware. By choosing this method, Transmeta is able to create a much more streamlined VLIW hardware core which, when combined with Code Morphing software and LongRun power management, results in both the high performance and low power required for today's demanding mobile computing environment.

Back to top 4. How is the Crusoe processor different from today's StrongARM and MIPs processors used in Handheld PCs? The StrongARM and MIPs processors are part of a class of architecture known as RISC (Reduced Instruction Set Computing). The RISC processors have been used in a wide range of first-generation handheld computers, because their average power of one watt or less leads to devices with very long battery life. However, success in the marketplace has been limited, since RISC processors are not compatible with many of today's PC and Internet software applications. Back to top 5. What is "Crusoe"? "Crusoe" is the brand name for what will become a family of smart microprocessors for a wide range of fully compatible mobile Internet computers. Back to top 6. How many Crusoe processor solutions are there? Three processor solutions are currently available. The first version of the Crusoe processor (TM3200) is targeted at mobile Internet devices operating with the Mobile Linux O/S. The other versions (TM5400/TM5600) are targeted at performance-oriented, ultra-light PCs. With up to 700MHz in performance and its new LongRun power management feature, the TM5400/TM5600 will deliver the highest performance and lowest power solution for mobile computing. LongRun power management is the key to bringing full functionality with the longest battery life to the ultra-light mobile PC, because it analyzes the application workload dynamically and continuously adjusts the processor's voltage and speed (MHz) to provide the required performance at the lowest power. In essence, LongRun power management is about maximizing battery life while optimizing performance. Back to top 7. Is Transmeta a public company? Transmeta Corporation is a publicly traded company (NASDAQ: TMTA). Back to top 8. How many versions of the Crusoe processor are planned? The number of Crusoe processor solutions will grow over time to become a family of products that are differentiated by both hardware and software features. The resulting product breadth has the potential to address virtually every need in the span of mobile computing. Back to top 9. Why did Transmeta choose to focus on ultra-light mobile PCs and mobile Internet devices?

Prior to Crusoe, mobile processors were simply desktop processors that were de-rated for the mobile market. Transmeta viewed this as a significant opportunity and specifically designed the smart microprocessor for this underserved market. The Crusoe processor solves a number of problems (excessive heat, low battery life, and underperformance) that have frustrated end users. The Crusoe Processor, along with the emergence of affordable, high-speed, wireless communications will accentuate the shift by users to mobile PCs as they begin to understand that a high-performance, fully compatible solution for all day computing now exists. Back to top 10. Are Crusoe processors available now? The first Crusoe processors, the TM3200, TM5400, and TM5600 are available and shipping now. Back to top 11. What will the Crusoe processor mean for mobility? The Crusoe processor will enable a whole class of Mobile Internet Computers that until now have suffered from tradeoffs in performance, compatibility, and low battery life. In addition, the Crusoe brand itself will serve as the guidepost for users trying to make the correct mobile computer buying decisions. Back to top 12. What is included in the Crusoe processor solution? The Crusoe processor consists of two components. The first is a VLIW processor packaged in a 474 BGA. The second is a layer of software called Code Morphing Software, which resides in the mobile system's Flash ROM. Both components work together as a complete x86 instruction-set-compatible solution. Back to top 13. How does the Crusoe processor interface with other components in a mobile computer? The Crusoe processor contains an on-chip SDRAM memory controller and a PCI controller to interface with industry standard memory and I/O devices (for example, graphics and communications solutions). The model TM5400 has an additional memory controller that interfaces with the DRAM industry's newest low power, high performance memory called DDR-SDRAM (Double Data Rate). Back to top 14. What is the power consumption of a Crusoe processor? The Crusoe processor can operate on as low as 10 to 20 milliwatts when running everyday applications like email or Internet browsing. On heavy-duty multimedia applications, like DVD movie playback the processor will consume fewer than two watts.

The extremely low power consumption delivered on multimedia applications can be directly attributed to a new feature called LongRun power management. LongRun has the distinct ability to analyze the application workload dynamically and to adjust continuously the processor's speed (MHz) and voltage to provide the necessary performance. This new feature promises to extend the battery life of all applications, most specifically those requiring the constant attention of the processor. This is a dramatic departure from today's ultra-light PCs, which are incapable of delivering over one and a half or two hours of runtime for DVD movies. Back to top 15. How does Crusoe processor's LongRun power management compare to Intel's SpeedStep (Geyserville) technology found on the Mobile Pentium III processor? Intel's SpeedStep technology was designed to bring additional desktop-like performance to mobile computers when they are residing in a docking station. The docking station is specially designed to provide additional cooling to the thermally hotter processor. When the mobile computer is taken on the road, performance is reduced, since the processor has to run at a lower speed to avoid overheating. The LongRun power management feature within the Code Morphing Software allows the processor to run at peak performance independent of its power source (AC outlet or DC batteries). In addition, LongRun power management analyzes the application workload dynamically and continuously adjusts the processor's speed (MHz) and voltage accordingly. This procedure is performed without any user intervention and is the most efficient method of operating a processor. LongRun power management will make its biggest impact in ultra-light (less than four pound) portables that up to now have had difficulty in running multimedia applications for longer than an hour or two. Back to top 16. What are the benefits to companies that use the Crusoe processor in their mobile computers? Companies will benefit from using the entire family of Crusoe processors across a whole range of mobile Internet computers. Whether it's a web slate or a four-pound ultra-light PC with a 13-inch LCD display and a DVD drive, the Crusoe Processor ensures the highest performance with the lowest power consumption. Back to top 17. Can users expect a full day's operation with a mobile PC based on a Crusoe Processor? The Crusoe Processor with its very low operating power creates an opportunity for PC OEMs to create all-day computers that deliver the full PC and Internet experience. Transmeta has not only delivered on a low power processor, but it is also developing reference designs for customers to use in developing mobile systems that consume just four watts when active. At four watts of power consumption, a light-weight mobile system with a

32 watt-hour Lithium battery can deliver eight hours of use. Back to top 18. Will the Crusoe processor be found in handhelds or web slate computers? The Crusoe processor with the Midori Linux operating system makes for a very favorable solution in a web slate or handheld Internet device. It delivers the performance and compatibility necessary to provide users with the full Internet experience while consuming very little power. Back to top 19. What is Midori Linux? Transmeta has created a Linux distribution to support its OEM customers called Midori Linux. Midori Linux is designed for systems without hard disks, such as Mobile Internet devices (for example, Web slates, clients). The principal enhancements for Midori Linux are in power management and in the reduction of the memory footprint. Back to top 20. Is Transmeta getting into the Linux distribution business like RedHat? No. Transmeta does not intend to support end users. The purpose for creating Midori Linux for OEM customers is to provide a total solution including the Crusoe processor, the Code Morphing software, all the required driver support for our motherboard platform and the Midori Linux operating system. This will provide our OEM customers with the best combination of features and time to market for the emerging Internet device marketplace. Back to top 21. Does Transmeta intend to release Midori Linux to the open source community? Yes. This was done on March 13, 2001. It is available for download at http://midori.transmeta.com/. Back to top 22. Who builds Transmeta's Crusoe processor solution? The hardware piece of the Crusoe Processor solution, the VLIW chip, is fabricated and packaged by IBM's Microelectronic Division. The Code Morphing software is developed and distributed along with the processor by Transmeta as a complete solution. Back to top 23. When was Transmeta founded? Dave Ditzel, along with seven colleagues founded Transmeta in 1995. Transmeta is based in Santa Clara, California. Back to top

24. What is Linus Torvalds's role at Transmeta? Linus Torvalds is a member of the very talented software team that created Transmeta's patented Code Morphing Software. Back to top 25. How are the two versions of the Crusoe processor designated? The two versions of the Crusoe Processor will be known by the common Crusoe brand, since they share the same attributes required for truly mobile computing.

Chapter 4

Processor Architecture
Modern microprocessors are among the most complex systems ever created by humans. A single silicon chip, roughly the size of a fingernail, can contain a complete high-performance processor, large cache memories, and the logic required to interface it to external devices. In terms of performance, the processors implemented on a single chip today dwarf the room-sized supercomputers that cost over $10 million just 20 years ago. Even the embedded processors found in everyday appliances such as cell phones, personal digital assistants, and handheld game systems are far more powerful than the early developers of computers ever envisioned. Thus far, we have only viewed computer systems down to the level of machine-language programs. We have seen that a processor must execute a sequence of instructions, where each instruction performs some primitive operation, such as adding two numbers. An instruction is encoded in binary form as a sequence of 1 or more bytes. The instructions supported by a particular processor and their byte-level encodings are known as its instruction-set architecture (ISA). Different families of processors, such as Intel IA32, IBM/Freescale PowerPC, and the ARM processor family have different ISAs. A program compiled for one type of machine will not run on another. On the other hand, there are many different models of processors within a single family. Each manufacturer produces processors of ever-growing performance and complexity, but the different models remain compatible at the ISA level. Popular families, such as IA32, have processors supplied by multiple manufacturers. Thus, the ISA provides a conceptual layer of abstraction between compiler writers, who need only know what instructions are permitted and how they are encoded, and processor designers, who must build machines that execute those instructions. In this chapter, we take a brief look at the design of processor hardware. We study the way a hardware system can execute the instructions of a particular ISA. This view will give you a better understanding of how computers work and the technological challenges faced by computer manufacturers. One important concept is that the actual way a modern processor operates can be quite different from the model of computation implied by the ISA. The ISA model would seem to imply sequential instruction execution, where each instruction is fetched and executed to completion before the next one begins. By executing different parts of multiple instructions simultaneously, the processor can achieve higher performance than if it executed just one instruction at a time. Special mechanisms are used to make sure the processor computes the same

results as it would with sequential execution. This idea of using clever tricks to improve performance while maintaining the functionality of a simpler and more abstract model is well known in computer science. 317 318 CHAPTER 4. PROCESSOR ARCHITECTURE Examples include the use of caching in Web browsers and information retrieval data structures such as balanced binary trees and hash tables. Chances are you will never design your own processor. This is a task for experts working at fewer than 100 companies worldwide. Why, then, should you learn about processor design? It is intellectually interesting and important. There is an intrinsic value in learning how things work. It is especially interesting to learn the inner workings of a system that is such a part of the daily lives of computer scientists and engineers and yet remains a mystery to many. Processor design embodies many of the principles of good engineering practice. It requires creating a simple and regular structure to perform a complex task. Understanding how the processor works aids in understanding how the overall computer system works. In Chapter 6, we will look at the memory system and the techniques used to create an image of a very large memory with a very fast access time. Seeing the processor side of the processor-memory interface will make this presentation more complete. Although few people design processors, many design hardware systems that contain processors. This has become commonplace as processors are embedded into real-world systems such as automobiles and appliances. Embedded-system designers must understand how processors work, because these systems are generally designed and programmed at a lower level of abstraction than is the case for desktop systems. You just might work on a processor design. Although the number of companies producing microprocessors is small, the design teams working on those processors are already large and growing. There can be over 1000 people involved in the different aspects of a major processor design. In this chapter, we start by defining a simple instruction set that we use as a running example for our processor implementations. We call this the Y86 instruction set, because it was inspired by the IA32 instruction set, which is colloquially referred to as x86. Compared with IA32, the Y86 instruction set has fewer data types, instructions, and addressing modes. It also has a simpler byte-level encoding. Still, it is sufficiently complete to allow us to write simple programs manipulating integer data. Designing a processor to implement Y86 requires us to face many of the challenges faced by processor designers. We then provide some background on digital hardware design. We describe the basic building blocks used in a processor and how they are connected together and operated. This presentation builds on our discussion of Boolean algebra and bit-level operations from Chapter 2. We also introduce a simple language, HCL (for Hardware Control Language), to describe the control portions of hardware systems. We will later use this language to describe our processor designs. Even if you already have some background in logic design, read this section to understand our particular notation. As a first step in designing a processor, we present a functionally correct, but somewhat impractical, Y86

processor based on sequential operation. This processor executes a complete Y86 instruction on every clock cycle. The clock must run slowly enough to allow an entire series of actions to complete within one cycle. Such a processor could be implemented, but its performance would be well below what could be achieved for this much hardware. With the sequential design as a basis, we then apply a series of transformations to create a pipelined processor. This processor breaks the execution of each instruction into five steps, each of which is handled 4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 319 Figure 4.1: Y86 programmer-visible state. As with IA32, programs for Y86 access and modify
%%%%eeeebacdxxxx %%%%eeeebdssppii ZFSFOF

the program registers, the condition code, the program counter (PC), and the memory. The status code indicates whether the program is running normally, or some special event has occurred.

by a separate section or stage of the hardware. Instructions progress through the stages of the pipeline, with one instruction entering the pipeline on each clock cycle. As a result, the processor can be executing the different steps of up to five instructions simultaneously. Making this processor preserve the sequential behavior of the Y86 ISA requires handling a variety of hazard conditions, where the location or operands of one instruction depend on those of other instructions that are still in the pipeline. We have devised a variety of tools for studying and experimenting with our processor designs. These include an assembler for Y86, a simulator for running Y86 programs on your machine, and simulators for two sequential and one pipelined processor design. The control logic for these designs is described by files in HCL notation. By editing these files and recompiling the simulator, you can alter and extend the simulators behavior. A number of exercises are provided that involve implementing new instructions and modifying how the machine processes instructions. Testing code is provided to help you evaluate the correctness of your modifications. These exercises will greatly aid your understanding of the material and will give you an appreciation for the many different design alternatives faced by processor designers. Web Aside ARCH:VLOG presents a representation of our pipelined Y86 processor in the Verilog hardware description language. This involves creating modules for the basic hardware building blocks and for the overall processor structure. We automatically translate the HCL description of the control logic into Verilog. By first debugging the HCL description with our simulators, we eliminate many of the tricky bugs that would otherwise show up in the hardware design. Given a Verilog description, there are commercial and open-source tools to support simulation and logic synthesis, generating actual circuit designs for the microprocessors. So, although much of the effort we expend here is to create pictorial and textual descriptions of a system, much as one would when writing software, the fact that these designs can be automatically synthesized demonstrates that we are indeed creating a system that can be realized as hardware.

4.1 The Y86 Instruction Set Architecture

Defining an instruction set architecture, such as Y86, includes defining the different state elements, the set of instructions and their encodings, a set of programming conventions, and the handling of exceptional events. 320 CHAPTER 4. PROCESSOR ARCHITECTURE

4.1.1 Programmer-Visible State


As Figure 4.1 illustrates, each instruction in a Y86 program can read and modify some part of the processor state. This is referred to as the programmer-visible state, where the programmer in this case is either someone writing programs in assembly code or a compiler generating machine-level code. We will see in our processor implementations that we do not need to represent and organize this state in exactly the manner implied by the ISA, as long as we can make sure that machine-level programs appear to have access to the programmer-visible state. The state for Y86 is similar to that for IA32. There are eight program registers: %eax, %ecx, %edx, %ebx, %esi, %edi, %esp, and %ebp. Each of these stores a word. Register %esp is used as a stack pointer by the push, pop, call, and return instructions. Otherwise, the registers have no fixed meanings or values. There are three single-bit condition codes, ZF, SF, and OF, storing information about the effect of the most recent arithmetic or logical instruction. The program counter (PC) holds the address of the instruction currently being executed. The memory is conceptually a large array of bytes, holding both program and data. Y86 programs reference memory locations using virtual addresses. A combination of hardware and operating system software translates these into the actual, or physical, addresses indicating where the values are actually stored in memory. We will study virtual memory in more detail in Chapter 9. For now, we can think of the virtual memory system as providing Y86 programs with an image of a monolithic byte array. A final part of the program state is a status code Stat, indicating the overall state of program execution. It will indicate either normal operation, or that some sort of exception has occurred, such as when an instruction attempts to read from an invalid memory address. The possible status codes and the handling of exceptions is described in Section 4.1.4.

4.1.2 Y86 Instructions


Figure 4.2 gives a concise description of the individual instructions in the Y86 ISA. We use this instruction set as a target for our processor implementations. The set of Y86 instructions is largely a subset of the IA32 instruction set. It includes only 4-byte integer operations, has fewer addressing modes, and includes a smaller set of operations. Since we only use 4-byte data, we can refer to these as words without any ambiguity. In this figure, we show the assembly-code representation of the instructions on the left and the byte encodings on the right. The assembly-code format is similar to the ATT format for IA32. Here are some further details about the different Y86 instructions. The IA32 movl instruction is split into four different instructions: irmovl, rrmovl, mrmovl, and

rmmovl, explicitly indicating the form of the source and destination. The source is either immediate (i), register (r), or memory (m). It is designated by the first character in the instruction name. The destination is either register (r) or memory (m). It is designated by the second character in the instruction name. Explicitly identifying the four types of data transfer will prove helpful when we decide how to implement them. The memory references for the two memory movement instructions have a simple base and displacement format. We do not support the second index register or any scaling of a registers value in the address computation. 4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 321
Byte 0 1 2 3 4 5 pushlrA A 0rAF jXXDest 7fn Dest poplrA B 0rAF callDest 8 0 Dest irrmrrmmmooovvvlllrrVAA,,,rrBDB(rB) 234 000rrFAArrrBBB DV mOrPmlovrAl,DrB(rB),rA 56f0nrrAArrBB D ret 9 0 hnaoplt 01 00 cmovXXrA,rB 2fnrArB

Figure 4.2: Y86 instruction set. Instruction encodings range between 1 and 6 bytes. An instruction
consists of a 1-byte instruction specifier, possibly a 1-byte register specifier, and possibly a 4-byte constant word. Field fn specifies a particular integer operation (OPl), data movement condition (cmovXX), or branch condition (jXX). All numeric values are shown in hexadecimal.
aasnudbddlll 666 012 xorl 6 3 jjjmllpe 777 012 je 7 3 jjjnggee 777 456 Operations Branches crmrmovovlle22 01 ccmmoovvel 22 23 cccmmmooovvvnggee222 456 Moves

Figure 4.3: Function codes for Y86 instruction set. The code specifies a particular integer
operation, branch condition, or data transfer condition. These instructions are shown as OPl, jXX, and cmovXX in Figure 4.2.

322 CHAPTER 4. PROCESSOR ARCHITECTURE As with IA32, we do not allow direct transfers from one memory location to another. In addition, we do not allow a transfer of immediate data to memory. There are four integer operation instructions, shown in Figure 4.2 as OPl. These are addl, subl, andl, and xorl. They operate only on register data, whereas IA32 also allows operations on memory data. These instructions set the three condition codes ZF, SF, and OF (zero, sign, and overflow). The seven jump instructions (shown in Figure 4.2 as jXX) are jmp, jle, jl, je, jne, jge, and jg. Branches are taken according to the type of branch and the settings of the condition codes. The branch conditions are the same as with IA32 (Figure 3.12). There are six conditional move instructions (shown in Figure 4.2 as cmovXX): cmovle, cmovl, cmove, cmovne, cmovge, and cmovg. These have the same format as the register-register move instruction rrmovl, but the destination register is updated only if the condition codes satisfy the required constraints. The call instruction pushes the return address on the stack and jumps to the destination address. The ret instruction returns from such a call. The pushl and popl instructions implement push and pop, just as they do in IA32. The halt instruction stops instruction execution. IA32 has a comparable instruction, called hlt. IA32 application programs are not permitted to use this instruction, since it causes the entire system to suspend operation. For Y86, executing the halt instruction causes the processor to stop, with the status code set to HLT. (See Section 4.1.4.)

4.1.3 Instruction Encoding


Figure 4.2 also shows the byte-level encoding of the instructions. Each instruction requires between 1 and 6 bytes, depending on which fields are required. Every instruction has an initial byte identifying the instruction type. This byte is split into two 4-bit parts: the high-order, or code, part, and the low-order, or function, part. As you can see in Figure 4.2, code values range from 0 to 0xB. The function values are significant only

for the cases where a group of related instructions share a common code. These are given in Figure 4.3, showing the specific encodings of the integer operation, conditional move, and branch instructions. Observe that rrmovl has the same instruction code as the conditional moves. It can be viewed as an unconditional move just as the jmp instruction is an unconditional jump, both having function code 0. As shown in Figure 4.4, each of the eight program registers has an associated register identifier (ID) ranging from 0 to 7. The numbering of registers in Y86 matches what is used in IA32. The program registers are stored within the CPU in a register file, a small random-access memory where the register IDs serve as addresses. ID value 0xF is used in the instruction encodings and within our hardware designs when we need to indicate that no register should be accessed. Some instructions are just 1 byte long, but those that require operands have longer encodings. First, there can be an additional register specifier byte, specifying either one or two registers. These register fields are called rA and rB in Figure 4.2. As the assembly-code versions of the instructions show, they can specify the registers used for data sources and destinations, as well as the base register used in an address computation, 4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 323 Number Register name 0 %eax 1 %ecx 2 %edx 3 %ebx 4 %esp 5 %ebp 6 %esi 7 %edi F No register Figure 4.4: Y86 program register identifiers. Each of the eight program registers has an associated
identifier (ID) ranging from 0 to 7. ID 0xF in a register field of an instruction indicates the absence of a register operand.

depending on the instruction type. Instructions that have no register operands, such as branches and call, do not have a register specifier byte. Those that require just one register operand (irmovl, pushl, and popl) have the other register specifier set to value 0xF. This convention will prove useful in our processor implementation. Some instructions require an additional 4-byte constant word. This word can serve as the immediate data for irmovl, the displacement for rmmovl and mrmovl address specifiers, and the destination of branches and calls. Note that branch and call destinations are given as absolute addresses, rather than using the PCrelative addressing seen in IA32. Processors use PC-relative addressing to give more compact encodings of branch instructions and to allow code to be copied from one part of memory to another without the need to

update all of the branch target addresses. Since we are more concerned with simplicity in our presentation, we use absolute addressing. As with IA32, all integers have a little-endian encoding. When the instruction is written in disassembled form, these bytes appear in reverse order. As an example, let us generate the byte encoding of the instruction rmmovl %esp,0x12345(%edx) in hexadecimal. From Figure 4.2, we can see that rmmovl has initial byte 40. We can also see that source register %esp should be encoded in the rA field, and base register %edx should be encoded in the rB field. Using the register numbers in Figure 4.4, we get a register specifier byte of 42. Finally, the displacement is encoded in the 4-byte constant word. We first pad 0x12345 with leading zeros to fill out 4 bytes, giving a byte sequence of 00 01 23 45. We write this in byte-reversed order as 45 23 01 00. Combining these, we get an instruction encoding of 404245230100. One important property of any instruction set is that the byte encodings must have a unique interpretation. An arbitrary sequence of bytes either encodes a unique instruction sequence or is not a legal byte sequence. This property holds for Y86, because every instruction has a unique combination of code and function in its initial byte, and given this byte, we can determine the length and meaning of any additional bytes. This property ensures that a processor can execute an object-code program without any ambiguity about the meaning of the code. Even if the code is embedded within other bytes in the program, we can readily determine the instruction sequence as long as we start from the first byte in the sequence. On the other hand, if we do not know the starting position of a code sequence, we cannot reliably determine how to split the 324 CHAPTER 4. PROCESSOR ARCHITECTURE sequence into individual instructions. This causes problems for disassemblers and other tools that attempt to extract machine-level programs directly from object-code byte sequences. Practice Problem 4.1:
Determine the byte encoding of the Y86 instruction sequence that follows. The line .pos 0x100 indicates that the starting address of the object code should be 0x100. .pos 0x100 # Start code at address 0x100 irmovl $15,%ebx # Load 15 into %ebx rrmovl %ebx,%ecx # Copy 15 to %ecx loop: # loop: rmmovl %ecx,-3(%ebx) # Save %ecx at address 15-3 = 12 addl %ebx,%ecx # Increment %ecx by 15 jmp loop # Goto loop

Practice Problem 4.2:


For each byte sequence listed, determine the Y86 instruction sequence it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs. For each sequence, we show the starting address, then a colon, and then the byte sequence. A. 0x100:30f3fcffffff40630008000000 B. 0x200:a06f80080200000030f30a00000090 C. 0x300:50540700000010f0b01f D. 0x400:6113730004000000 E. 0x500:6362a0f0

Aside: Comparing IA32 to Y86 instruction encodings Compared with the instruction encodings used in IA32, the encoding of Y86 is much simpler but also less compact. The register fields occur only in fixed positions in all Y86 instructions, whereas they are packed into various positions in the different IA32 instructions. We use a 4-bit encoding of registers, even though there are only eight possible registers. IA32 uses just 3 bits. Thus, IA32 can pack a push or pop instruction into just 1 byte, with a 5-bit field indicating the instruction type and the remaining 3 bits for the register specifier. IA32 can encode constant values in 1, 2, or 4 bytes, whereas Y86 always requires 4 bytes. End Aside. Aside: RISC and CISC instruction sets IA32 is sometimes labeled as a complex instruction set computer (CISCpronounced sisk), and is deemed to be the opposite of ISAs that are classified as reduced instruction set computers (RISCpronounced risk). Historically, CISC machines came first, having evolved from the earliest computers. By the early 1980s, instruction sets for mainframe and minicomputers had grown quite large, as machine designers incorporated new instructions to support high-level tasks, such as manipulating circular buffers, performing decimal arithmetic, and evaluating polynomials. The first microprocessors appeared in the early 1970s and had limited instruction sets, because the integrated-circuit technology then posed severe constraints on what could be implemented on a single chip. Microprocessors evolved quickly and, by the early 1980s, were following the path of increasing instruction-set complexity set by mainframes and minicomputers. The x86 family took this path, evolving into IA32, and more recently into x86-64. Even the x86 line continues to evolve as new classes of instructions are added based on the needs of emerging applications.

4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 325


The RISC design philosophy developed in the early 1980s as an alternative to these trends. A group of hardware and compiler experts at IBM, strongly influenced by the ideas of IBM researcher John Cocke, recognized that they could generate efficient code for a much simpler form of instruction set. In fact, many of the high-level instructions that were being added to instruction sets were very difficult to generate with a compiler and were seldom used. A simpler instruction set could be implemented with much less hardware and could be organized in an efficient pipeline structure, similar to those described later in this chapter. IBM did not commercialize this idea until many years later, when it developed the Power and PowerPC ISAs. The RISC concept was further developed by Professors David Patterson, of the University of California at Berkeley, and John Hennessy, of Stanford University. Patterson gave the name RISC to this new class of machines, and CISC to the existing class, since there had previously been no need to have a special designation for a nearly universal form of instruction set. Comparing CISC with the original RISC instruction sets, we find the following general characteristics: CISC Early RISC A large number of instructions. The Intel document describing the complete set of instructions [28, 29] is over 1200 pages long. Many fewer instructions. Typically less than 100. Some instructions with long execution times. These include instructions that copy an entire block from one part of memory to another and others that copy multiple registers to and from memory. No instruction with a long execution time. Some early RISC machines did not even have an integer multiply instruction, requiring compilers to implement multiplication as a sequence of additions. Variable-length encodings. IA32 instructions can range from 1 to 15 bytes. Fixed-length encodings. Typically all instructions are encoded as 4 bytes. Multiple formats for specifying operands. In IA32, a memory operand specifier can have many different combinations of displacement, base and index registers, and scale factors. Simple addressing formats. Typically just base and displacement addressing. Arithmetic and logical operations can be applied to both memory and register operands. Arithmetic and logical operations only use register operands. Memory referencing is only allowed by load instructions, reading from memory into a register, and store instructions, writing from a register to memory. This convention is referred to as a load/store architecture. Implementation artifacts hidden from machinelevel programs. The ISA provides a clean abstraction between programs and how they get executed. Implementation artifacts exposed to machine-level

programs. Some RISC machines prohibit particular instruction sequences and have jumps that do not take effect until the following instruction is executed. The compiler is given the task of optimizing performance within these constraints. Condition codes. Special flags are set as a side effect of instructions and then used for conditional branch testing. No condition codes. Instead, explicit test instructions store the test results in normal registers for use in conditional evaluation. Stack-intensive procedure linkage. The stack is used for procedure arguments and return addresses. Register-intensive procedure linkage. Registers are used for procedure arguments and return addresses. Some procedures can thereby avoid any memory references. Typically, the processor has many more (up to 32) registers. The Y86 instruction set includes attributes of both CISC and RISC instruction sets. On the CISC side, it has condition codes, variable-length instructions, and stack-intensive procedure linkages. On the RISC side, it uses a load-store architecture and a regular encoding. It can be viewed as taking a CISC instruction set (IA32) and simplifying it by applying some of the principles of RISC. End Aside.

326 CHAPTER 4. PROCESSOR ARCHITECTURE Value Name Meaning 1 AOK Normal operation 2 HLT halt instruction encountered 3 ADR Invalid address encountered 4 INS Invalid instruction encountered Figure 4.5: Y86 status codes. In our design, the processor halts for any code other than AOK.
Aside: The RISC versus CISC controversy Through the 1980s, battles raged in the computer architecture community regarding the merits of RISC versus CISC instruction sets. Proponents of RISC claimed they could get more computing power for a given amount of hardware through a combination of streamlined instruction set design, advanced compiler technology, and pipelined processor implementation. CISC proponents countered that fewer CISC instructions were required to perform a given task, and so their machines could achieve higher overall performance. Major companies introduced RISC processor lines, including Sun Microsystems (SPARC), IBM and Motorola (PowerPC), and Digital Equipment Corporation (Alpha). A British company, Acorn Computers Ltd., developed its own architecture, ARM (originally an acronym for Acorn RISC Machine), which is widely used in embedded applications, such as cellphones. In the early 1990s, the debate diminished as it became clear that neither RISC nor CISC in their purest forms were better than designs that incorporated the best ideas of both. RISC machines evolved and introduced more instructions, many of which take multiple cycles to execute. RISC machines today have hundreds of instructions in their repertoire, hardly fitting the name reduced instruction set machine. The idea of exposing implementation artifacts to machine-level programs proved to be short-sighted. As new processor models were developed using more advanced hardware structures, many of these artifacts became irrelevant, but they still remained part of the instruction set. Still, the core of RISC design is an instruction set that is well-suited to execution on a pipelined machine. More recent CISC machines also take advantage of high-performance pipeline structures. As we will discuss in Section 5.7, they fetch the CISC instructions and dynamically translate them into a sequence of simpler, RISC-like operations. For example, an instruction that adds a register to memory is translated into three operations: one to read the original memory value, one to perform the addition, and a third to write the sum to memory. Since the dynamic translation can generally be performed well in advance of the actual instruction execution, the processor can sustain a very high execution rate. Marketing issues, apart from technological ones, have also played a major role in determining the success of different instruction sets. By maintaining compatibility with its existing processors, Intel with x86 made it easy to keep moving from one generation of processor to the next. As integrated-circuit technology improved, Intel and other x86 processor manufacturers could overcome the inefficiencies created by the original 8086 instruction set design, using RISC techniques to produce performance comparable to the best RISC machines. As we saw in Section 3.13, the evolution of IA32 into x86-64 provided an opportunity to incorporate several features of RISC into x86. In the areas of desktop and laptop computing, x86 has achieved total domination, and it is increasingly popular for high-end server machines. RISC processors have done very well in the market for embedded processors, controlling such systems as cellular telephones, automobile brakes, and Internet appliances. In these applications, saving on cost and power is more important than maintaining backward compatibility. In terms of the number of processors sold, this is a very large and growing market. End Aside.

4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 327 IA32 code


int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 movl %esp,%ebp 4 movl 8(%ebp),%ecx ecx = Start 5 movl 12(%ebp),%edx edx = Count 6 xorl %eax,%eax sum = 0 7 testl %edx,%edx 8 je .L34 9 .L35: 10 addl (%ecx),%eax add *Start to sum 11 addl $4,%ecx Start++ 12 decl %edx Count-13 jnz .L35 Stop when 0 14 .L34: 15 movl %ebp,%esp 16 popl %ebp 17 ret

Y86 code
int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 rrmovl %esp,%ebp 4 mrmovl 8(%ebp),%ecx ecx = Start 5 mrmovl 12(%ebp),%edx edx = Count 6 xorl %eax,%eax sum = 0 7 andl %edx,%edx Set condition codes 8 je End 9 Loop: 10 mrmovl (%ecx),%esi get *Start 11 addl %esi,%eax add to sum 12 irmovl $4,%ebx 13 addl %ebx,%ecx Start++ 14 irmovl $-1,%ebx 15 addl %ebx,%edx Count-16 jne Loop Stop when 0 17 End: 18 rrmovl %ebp,%esp 19 popl %ebp 20 ret

Figure 4.6: Comparison of Y86 and IA32 assembly programs. The Sum function computes the
sum of an integer array. The Y86 code differs from the IA32 mainly in that it may require multiple instructions to perform what can be done with a single IA32 instruction.

4.1.4 Y86 Exceptions


The programmer-visible state for Y86 (Figure 4.1) includes a status code Stat describing the overall state of the executing program. The possible values for this code are shown in Figure 4.5. Code value 1, named AOK, indicates that the program is executing normally, while the other codes indicate that some type of exception has occurred. Code 2, named HLT, indicates that the processor has executed a halt instruction. Code 3, named ADR, indicates that the processor attempted to read from or write to an invalid memory address, either while fetching an instruction or while reading or writing data. We limit the maximum address (the

exact limit varies by implementation), and any access to an address beyond this limit will trigger an ADR exception. Code 4, named INS, indicates that an invalid instruction code has been encountered. For Y86, we will simply have the processor stop executing instructions when it encounters any of the exceptions listed. In a more complete design, the processor would typically invoke an exception handler, a procedure designated to handle the specific type of exception encountered. As described in Chapter 8, exception handlers can be configured to have different effects, such as aborting the program or invoking a user-defined signal handler. 328 CHAPTER 4. PROCESSOR ARCHITECTURE

4.1.5 Y86 Programs


Figure 4.6 shows IA32 and Y86 assembly code for the following C function:
int Sum(int *Start, int Count) { int sum = 0; while (Count) { sum += *Start; Start++; Count--; } return sum; }

The IA32 code was generated by the GCC compiler. The Y86 code is essentially the same, except that Y86 sometimes requires two instructions to accomplish what can be done with a single IA32 instruction. If we had written the program using array indexing, however, the conversion to Y86 code would be more difficult, since Y86 does not have scaled addressing modes. This code follows many of the programming conventions we have seen for IA32, including the use of the stack and frame pointers. For simplicity, it does not follow the IA32 convention of having some registers designated as callee-save registers. This is just a programming convention that we can either adopt or ignore as we please. Figure 4.7 shows an example of a complete program file written in Y86 assembly code. The program contains both data and instructions. Directives indicate where to place code or data and how to align it. The program specifies issues such as stack placement, data initialization, program initialization, and program termination. In this program, words beginning with . are assembler directives telling the assembler to adjust the address at which it is generating code or to insert some words of data. The directive .pos 0 (line 2) indicates that the assembler should begin generating code starting at address 0. This is the starting address for all Y86 programs. The next two instructions (lines 3 and 4) initialize the stack and frame pointers. We can see that the label Stack is declared at the end of the program (line 47), to indicate address 0x100 using a .pos directive (line 46). Our stack will therefore start at this address and grow toward lower addresses. We must ensure that the stack does not grow so large that it overwrites the code or other program data. Lines 9 to 13 of the program declare an array of four words, having values 0xd, 0xc0, 0xb00, and

0xa000. The label array denotes the start of this array, and is aligned on a 4-byte boundary (using the .align directive). Lines 15 to 24 show a main procedure that calls the function Sum on the 4word array and then halts. As this example shows, since our only tool for creating Y86 code is an assembler, the programmer must perform tasks we ordinarily delegate to the compiler, linker, and run-time system. Fortunately, we only do this for small programs, for which simple mechanisms suffice. Figure 4.8 shows the result of assembling the code shown in Figure 4.7 by an assembler we call YAS. The assembler output is in ASCII format to make it more readable. On lines of the assembly file that contain instructions or data, the object code contains an address, followed by the values of between 1 and 6 bytes. 4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 329
1 # Execution begins at address 0 2 .pos 0 3 init: irmovl Stack, %esp # Set up stack pointer 4 irmovl Stack, %ebp # Set up base pointer 5 call Main # Execute main program 6 halt # Terminate program 7 8 # Array of 4 elements 9 .align 4 10 array: .long 0xd 11 .long 0xc0 12 .long 0xb00 13 .long 0xa000 14 15 Main: pushl %ebp 16 rrmovl %esp,%ebp 17 irmovl $4,%eax 18 pushl %eax # Push 4 19 irmovl array,%edx 20 pushl %edx # Push array 21 call Sum # Sum(array, 4) 22 rrmovl %ebp,%esp 23 popl %ebp 24 ret 25 26 # int Sum(int *Start, int Count) 27 Sum: pushl %ebp 28 rrmovl %esp,%ebp 29 mrmovl 8(%ebp),%ecx # ecx = Start 30 mrmovl 12(%ebp),%edx # edx = Count 31 xorl %eax,%eax # sum = 0 32 andl %edx,%edx # Set condition codes 33 je End 34 Loop: mrmovl (%ecx),%esi # get *Start 35 addl %esi,%eax # add to sum 36 irmovl $4,%ebx # 37 addl %ebx,%ecx # Start++ 38 irmovl $-1,%ebx # 39 addl %ebx,%edx # Count-40 jne Loop # Stop when 0 41 End: rrmovl %ebp,%esp 42 popl %ebp 43 ret 44

# The stack starts here and grows to lower addresses .pos 0x100 Stack: Figure 4.7: Sample program written in Y86 assembly code. The Sum function is called to compute the sum of a four-element array.
45 46 47

330 CHAPTER 4. PROCESSOR ARCHITECTURE


| # Execution begins at address 0 0x000: | .pos 0 0x000: 30f400010000 | init: irmovl Stack, %esp # Set up stack pointer 0x006: 30f500010000 | irmovl Stack, %ebp # Set up base pointer 0x00c: 8024000000 | call Main # Execute main program 0x011: 00 | halt # Terminate program | | # Array of 4 elements 0x014: | .align 4 0x014: 0d000000 | array: .long 0xd 0x018: c0000000 | .long 0xc0 0x01c: 000b0000 | .long 0xb00 0x020: 00a00000 | .long 0xa000 | 0x024: a05f | Main: pushl %ebp 0x026: 2045 | rrmovl %esp,%ebp 0x028: 30f004000000 | irmovl $4,%eax 0x02e: a00f | pushl %eax # Push 4 0x030: 30f214000000 | irmovl array,%edx 0x036: a02f | pushl %edx # Push array 0x038: 8042000000 | call Sum # Sum(array, 4) 0x03d: 2054 | rrmovl %ebp,%esp 0x03f: b05f | popl %ebp 0x041: 90 | ret | | # int Sum(int *Start, int Count) 0x042: a05f | Sum: pushl %ebp 0x044: 2045 | rrmovl %esp,%ebp 0x046: 501508000000 | mrmovl 8(%ebp),%ecx # ecx = Start 0x04c: 50250c000000 | mrmovl 12(%ebp),%edx # edx = Count 0x052: 6300 | xorl %eax,%eax # sum = 0 0x054: 6222 | andl %edx,%edx # Set condition codes 0x056: 7378000000 | je End 0x05b: 506100000000 | Loop: mrmovl (%ecx),%esi # get *Start 0x061: 6060 | addl %esi,%eax # add to sum 0x063: 30f304000000 | irmovl $4,%ebx # 0x069: 6031 | addl %ebx,%ecx # Start++ 0x06b: 30f3ffffffff | irmovl $-1,%ebx # 0x071: 6032 | addl %ebx,%edx # Count-0x073: 745b000000 | jne Loop # Stop when 0 0x078: 2054 | End: rrmovl %ebp,%esp 0x07a: b05f | popl %ebp 0x07c: 90 | ret | | # The stack starts here and grows to lower addresses 0x100: | .pos 0x100 0x100: | Stack: Figure 4.8: Output of YAS assembler. Each line includes a hexadecimal address and between 1 and 6 bytes of object code.

4.1. THE Y86 INSTRUCTION SET ARCHITECTURE 331 We have implemented an instruction set simulator we call YIS, the purpose of which is to model the execution of a Y86 machine-code program, without attempting to model the behavior of any specific processor

implementation. This form of simulation is useful for debugging programs before actual hardware is available, and for checking the result of either simulating the hardware or running the program on the hardware itself. Running on our sample object code, YIS generates the following output:
Stopped in 52 steps at PC = 0x11. Status HLT, CC Z=1 S=0 O=0 Changes to registers: %eax: 0x00000000 0x0000abcd %ecx: 0x00000000 0x00000024 %ebx: 0x00000000 0xffffffff %esp: 0x00000000 0x00000100 %ebp: 0x00000000 0x00000100 %esi: 0x00000000 0x0000a000 Changes to memory: 0x00e8: 0x00000000 0x000000f8 0x00ec: 0x00000000 0x0000003d 0x00f0: 0x00000000 0x00000014 0x00f4: 0x00000000 0x00000004 0x00f8: 0x00000000 0x00000100 0x00fc: 0x00000000 0x00000011

The first line of the simulation output summarizes the execution and the resulting values of the PC and program status. In printing register and memory values, it only prints out words that change during simulation, either in registers or in memory. The original values (here they are all zero) are shown on the left, and the final values are shown on the right. We can see in this output that register %eax contains 0xabcd, the sum of the 4-element array passed to subroutine Sum. In addition, we can see that the stack, which starts at address 0x100 and grows toward lower addresses, has been used, causing changes to words of memory at addresses 0xe8 through 0xfc. This is well away from 0x7c, the maximum address of the executable code. Practice Problem 4.3:
Write Y86 code to implement a recursive sum function rSum, based on the following C code: int rSum(int *Start, int Count) { if (Count <= 0) return 0; return *Start + rSum(Start+1, Count-1); } You might find it helpful to compile the C code on an IA32 machine and then translate the instructions to Y86.

Potrebbero piacerti anche