Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Fei, Y.; Ravi, S.; Raghunathan, A.; Jha, N.K. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Volume: 23 Issue: 5 Pages: 652-664 May 2004
Abstract
In this paper, we present an efficient and accurate methodology for estimating the energy consumption of application programs running on extensible processors. Extensible processors, which are getting increasingly popular in embedded system design, allow a designer to customize a base processor core through instruction set extensions. Existing processor energy macromodeling techniques are not applicable to extensible processor, since they assume that the instruction set architecture as well as the underlying structural description of the micro-architecture remain fixed. Our solution to the above problem is a hybrid energy macromodel suitably parameterized to estimate the energy consumption of an application running on the corresponding application-specific extended processor instance, which incorporates any custom instruction extension. Such a characterization is facilitated by careful selection of macromodel parameters/variables that can capture both the functional and structural aspects of the execution of a program on an extensible processor.
A Hybrid Energy-Estimation Technique for Extensible Processor 2/24
2005/7/13
Abstract (cont.)
Another feature of the proposed energy characterization flow is the use of regression analysis to build the macromodel. Regression analysis allows for in-situ characterization, thus allowing arbitrary test programs to be used during macromodel construction. We validated the proposed methodology by characterizing the energy consumption of a state-of-the-art extensible processor (Tensilicas Xtensa). We used the macromodel to analyze the energy consumption of several benchmark applications with custom instructions. The mean absolute error in the macromodel estimates is only 3.3%, when compared to the energy values obtained by a commercial tool operating on the synthesized register-transfer level (RTL) description of the custom processor. Our approach achieves an average speedup of three orders of magnitude over the commercial RTL energy estimator. Our experiments show that the proposed methodology also achieves good relative accuracy, which is essential in energy optimization studies. Hence, our technique is both efficient and accurate.
A Hybrid Energy-Estimation Technique for Extensible Processor 3/24
2005/7/13
Outline
Whats the problem Introduction & related work Extensible processor energy macromodel requirements Proposed energy estimation methodology Experimental results and evaluation Conclusions
2005/7/13
4/24
Existing processor energy estimation framework is impractical for use in energy optimization done in the ASIP design cycle
The extension to the base processor ISA is not fixed The number of configurations/extensions is large
Its essential to have a fast and accurate energy estimation of an application running on an extensible processor for each candidate configuration in energy optimization studies
2005/7/13
5/24
Related Work
Structural macromodeling
2005/7/13
6/24
Instruction-level macromodeling
Advantage: High efficiency (Use ISS to yield energy estimation) Disadvantage: 1) Low accuracy 2) Require actual chip implement and this is infeasible for power tradeoff studies early in the design cycle
2005/7/13
Energy coefficients are calculated with regression analysis to build the macromodel
Use a set of given (Ei, M1,i ,.,Mk,i) ,i=1,2n to predict the best energy coefficient C1 , C2 ,.., Ck
2005/7/13
8/24
Instruction-level macromodeling for base processor Structural macromodeling for custom hardware extension Regression macromodeling for energy characterization
Energy consumption can simply be determined by instruction set simulation
Contributions
Combines the efficiency of instruction-level approaches and the accuracy of structural approaches Only needs the custom instruction descriptions Doest require the custom processor to be synthesized This is the only work on evaluate energy/performance tradeoff among candidate custom instructions for extensible processor at the early design cycle
A Hybrid Energy-Estimation Technique for Extensible Processor 9/24
2005/7/13
Xtensas ISA consists of a basic set of instructions plus a set of configurable and extensible options Extensibility is achieved by specifying application-specific functionality through custom instructions
The behavior of the custom instruction is descried using TIE (Tensilica Instruction Extension) language TIE is independent of the processors pipeline
z
Only need to describe the semantics of the instructions as if they consist of only combination logic
The hardware implementation of custom instructions Corresponding software development kit for the configuration
z z
ANCI C/C++ compiler, linker, assembler, debugger Cycle-accurate instruction set simulator (ISS)
A Hybrid Energy-Estimation Technique for Extensible Processor 10/24
2005/7/13
Specify the custom state register and indices Define a new instruction class with one or multiple custom instructions Describe the behavior of the instruction class
iclass statement
semantic statement
schedule statement
(Used for multiple cycle instruction)
11/24
temp1 ACCU
temp2
Augmented with custom hardware to implement three custom instruction: MULT, MAC and CUS MULT and MAC perform their functionality using shared custom hardware (which is dependent of base processor operand buses)
2005/7/13
Top horizontal bar lists the sequence of processor events dictated by its execution The bottom bar depicts the side effects in either the base processor or the custom hardware
Execution of the base processor instruction add actives custom hardware (X, MUX1, +1) in the second cycle Execution of the custom instructions (I2 and I3) active base processor hardware (ALU) in the second cycle
Side effect occurs because the custom hardware and the ALU of the base processor share the same operand buses
A Hybrid Energy-Estimation Technique for Extensible Processor 13/24
2005/7/13
Energy consumed by base processor instructions on the base processor core Energy dependency on inter-instruction correlation and other nonideal features (such as stalls, cache misses, etc.) Energy consumed by custom instructions on the custom hardware
z
The bottom bar of instructions I2 and I3 RdReg, Wait, WrReg, WrCR event in the top bar of instruction I2, I3, I4
A Hybrid Energy-Estimation Technique for Extensible Processor 14/24
2005/7/13
Test program suite incorporates custom instructions to cover all the custom HW library components Regression analysis require knowledge of both the dependent variable and the dependent variable independent variable
independent variable
Regression analysis finds the estimate of energy coefficient (energy macromodel construction complete)
2005/7/13 A Hybrid Energy-Estimation Technique for Extensible Processor 15/24
parameter values are fed to the energy macromodel to yield the energy estimation
2005/7/13
16/24
Reflect the usage of base processor core due to either base processor or custom instructions Eins= Earith*Cycarith + Eld*Cycld + Est*Cycst + E j*Cyc j + Ebr_tk* Cycbr_tk + Ebr_utk*Cycbr_utk + Ei*Numi + Ed*Numd + Euncache* Numuncache + Einterlock*Numinterlock + Eside_tie*Cycside_tie
Earith,.., Ebr_utk represent the average energy consumption of each instruction class Cycarith,.., Cycbr_utk represent the number of cycles taken by each instruction class Macromodel parameters Numi,..,Numinterlock denote the number of times each nonideal case occurs
Macromodel parameter Cycside_tie accounts for the number of cycles taken by all custom instructions
A Hybrid Energy-Estimation Technique for Extensible Processor 17/24
2005/7/13
Reflect the usage of custom hardware extensions due to either base processor or custom instructions Custom hardware energy consumption expresses as below: Estruc= E1 * Cyc1 + E2 * Cyc2 + E3 * Cyc3 +.+E10 * Cyc10
Note: structural macromodel parameters should be covered all the components present in the custom hardware library (10 component categories is this paper)
z
Macromodel parameters Cyc1,,Cyc10 denote the number of cycles in which each custom hardware component category is active Energy coefficients E1,..,E10 represent the average energy consumption for each kind of custom hardware component category
Custom functional blocks is activated when any custom instructions executing Custom functional blocks can also be activated when base processor instructions are running
z
Side effect due to the sharing of the same operand buses still affects the custom hardware
Dynamic resource usage analysis in the execution trace identifies the activated custom functional blocks (HW component) for each instruction
A Hybrid Energy-Estimation Technique for Extensible Processor 18/24
2005/7/13
E denotes a n*1 column vector which are grouped by the energy consumption data of n test programs M denotes a n*21 matrix which are grouped by the values corresponding to the macromodel parameters C is the energy coefficient vector corresponding to
{ Earith, Eld, Est, Ej, Ebr_tk, Ebr_utk, Ei, Ed, Euncache, Einterlock, Eside_tie, E1, E2, E3, E4, E5, E6, E7, E8, E9, E10 }
( denotes the estimate of energy coefficient C) ( denotes the estimate of total energy consumption E) Yields the energy coefficient vector C, such that the mean square error is minimized
2005/7/13
19/24
Energy consumption for each base processor instruction category per cycle Energy consumption for side-effect per cycle Energy consumption for execution-time effects per miss/per-interlock Energy consumption for different custom hardware components per cycle
2005/7/13
20/24
The maximum estimation error is 8.5% The average absolute error is only 3.3%
The proposed energy estimation methodology is very fast WattWatcher needs several more hours for energy estimation
( RTL description generation + RTL simulation + power estimation using WattWatcher )
2005/7/13
21/24
The accuracy of the macromodel is high both for the base processor and custom hardware
A Hybrid Energy-Estimation Technique for Extensible Processor 22/24
2005/7/13
Good relative accuracy of our macromodel The proposed energy estimation methodology is high relative accuracy and low effort (no custom processor
generation, no RTL simulation) Therefore, it is highly suitable for energy optimization studies
A Hybrid Energy-Estimation Technique for Extensible Processor 23/24
2005/7/13
Conclusions
Presented an efficient and accurate energy estimation methodology for extensible processors
High efficiency comes from energy estimation only requires instruction-set simulation based analysis of the application High accuracy comes from dynamic analysis of custom hardware usage pattern
Although it speedup energy estimation, but it still have good absolute accuracy (average absolute error is only 3.3%) and also achieve high relative accuracy
2005/7/13
24/24