Sei sulla pagina 1di 14

Journal of Microcomputer

Applications (1991) 14, 13-26

A look-up

and source

table

code generator

R. J. Chance

School of Electronic and Electrical Engineering, Birmingham University, Birmingham,


UK

A software system is described for the compression of a large look-up table to a smaller
one, consistent with a worst-case error predefined by the user. The tables and a suitable
source code for accessing them are automatically generated, with very little user
intervention. The techniques of linear interpolation and the partitioning of one table into
several are shown to be particularly attractive for reducing the table size, especially when
the considerable effort of manual generation to a known accuracy is removed. The use of
linear interpolation incurs only a small speed penalty when executed on a digital signal
processor and the large reductions in table size thus achieved can make the method a
faster and more reliable alternative to either the exact or approximate evaluation of
many functions.

1.

Background

The evaluation
of mathematical
functions is frequently required in real-time microprocessor systems. Some of these functions are precisely defined, such as sine or logarithm
for example, and some are quite arbitrary,
such as those produced from experimental
measurements.
The two common indirect methods of evaluating both types of function
are to use either look-up tables or equations such as polynomials,
which approximate
the
desired function. A look-up table would often be the preferred method in small, realtime systems on the grounds of high speed and low software complexity. The method is
particularly
attractive for high reliability or verifiable designs. Contact with real-time
microprocessor
designs over many years has shown that the use of tables is often rejected
unnecessarily.
It is often considered, without justification,
that they would be inconveniently large. The result is that that work is put into more complex and less reliable
methods
of function
evaluation.
The look-up
table schemes that are commonly
implemented
rarely attempt to approach the optimal design. This may be due to the
simplicity of a basic look-up table scheme, which can divert attention from the improved
performance
of more complex ones. In addition, it is usually difficult to derive a nearoptimal implementation
without trial and error methods which can be very timeconsuming when carried out by hand.

2.

Design aims

The aim of this project is to combine several effective methods for the design of
microprocessor
look-up tables into a package which may be used to compare the
performance
of various look-up table schemes, and to create the necessary source code
and tables. The method is intended to be sufficiently simple to be used, not only at the
code generation stage, but also during system design to benchmark different processors
and look-up schemes.
13
074557138/91/010026+

14 $03.00/O

0 1991 Academic Press Limited

14 R. J. Chance
The first design criterion is that the error in the function evaluation should be under
the control of the user. Although a simple specification of absolute error has been used
here, it has been borne in mind that the scheme may in the future need to handle errors
which vary over the range of the function. Two other important design criteria are table
size and the speed of the look-up algorithm. Of these, the size of a table is the most
difficult to estimate in advance of generating it, if techniques such as interpolation and
table partitioning are used. The generation of tables by the scheme described here is so
straightforward that trial and error is taken to be an acceptable method of producing a
sufficiently small table. The speed of the small number of look-up algorithms can be
compared by the user for any given processor and is data-independent.
The functions considered here are functions of a single variable known as X. The
function of X, F(x), is known as Y, following normal graphical notation. It is assumed
that Xis an integer, which can take a value between zero and some convenient maximum
positive value. In other words, X, and F(x) represent a look-up table, whose size is to be
reduced. It has been considered essential to use tabulated X, Y values as the raw input
data, in order that experimental results may be used as well as precisely known
functions.

3.

Look-up

techniques

In the simplest microprocessor look-up table schemes, an integer X is used as the table
index and a constant value added to obtain the microprocessor address where the value
of F(x) is to be found, Where it is possible to scale X to reduce the table size, division of
X by an integral power of 2, carried out by an efficient right shift operation is usually
used. Interpolation techniques seem to be rather rarely used to reduce table sizes in realtime microprocessor systems. In microprocessor implementations of table look-up, the
capabilities of the processor can have a large effect on the speed of the algorithm. The
specialized digital signal processor (DSP) can often be particularly suitable for this work
because: (a) some (such as the TMS320 family) possess a multiple bit shift, which can be
implemented to perform a division without any time penalty; (b) all DSPs possess a
single cycle multiply. This may be used for scaling, if a multi-bit shift is unavailable; and
(c) the fast multiplication allows linear interpolation methods to be used for table size
reduction with a much smaller reduction in speed than on most general-purpose
processors. For this reason, and because of the local availability of particularly
appropriate program development facilities, the Texas Instruments TMS32OC25
DSP [l] has been used here as the initial target processor.
Two refinements of the basic look-up table system are used here to reduce table size.
The first is linear interpolation, An arbitrary continuous function is shown graphically in
Fig. 1. The black circles mark points which represent the minimum number of X,Y
coordinates required to specify the function to the users satisfaction, when intermediate
values are obtained by linear interpolation. The table index may be obtained by
appropriately scaling X. Note that the scaling factor is determined by the parts of the
function with the largest curvature (assuming a constant error requirement). For an
interpolated table where the index is determined by dividing X by a single scaling factor,
additional points marked as white circles in Fig. 1 must be stored. In the case shown, 33
table entries are needed for a single, interpolated table.
In order to improve on this, a table may be split into several parts with different index

A look-up table and source code generator

B
B

15

X
Figure 1. Values of Y (a function of x) are shown graphically. Coordinates marked by (0) describe the
function adequately if linear interpolation is used; coordinates marked by (0) must be added if X values must
be tabulated at equal intervals.

scaling factors. Thus, ranges of the function which have different properties may be dealt
with separately. Those which are required to high accuracy or which are not very linear
may store values at frequent intervals of X, without imposing this on the rest of the
function. In Fig. 1, the function has been split into eight subtables A to H. Thus subtable
A only requires two points to be stored, whereas D and E need five. In fact, if
coordinates on the junction between areas can be shared, a further reduction can be
made. This technique will be known as table partitioning and in Fig. 1 would give rise to
the storage of 17 coordinates, compared with the 33 of a single table. Table partitioning
may be used in simple look-up schemes as well as those using linear interpolation.
However, in this case, the best improvement would be in separating parts of the function
with a shallow gradient from those of a steep gradient. A particular advantage of
partitioning the table in a suboptimal table generator is that the effect of poor generator
performance, e.g. due to a discontinuity, can often be limited to one subtable and
therefore still give a useable result.
A look-up table has been used in the present design to select one of a number of
subtables. This will be referred to as the subtable selection table (SST). An advantage of
this method, which is important in a mechanized design system, is that the execution
time can be made constant, irrespective of the number of subtables used. A second
advantage is that, in the usual case of a continuous function, particular function
properties are often associated with a contiguous range of X values. The index of the

16

R. J. Chance

SST may therefore be a division by means of a shift operation. As it is envisaged that the
size of the SST will normally be much smaller than the main tables, there has been no
attempt to minimize the size by using index divisors other than powers of 2 or
mechanizing the number of partitions chosen.

4.

The linear interpolation algorithm

The linear interpolation used assumes that a binary shift of X has been used to obtain the
table index. If the value X is shifted right by N, a table index i and a fractional part r can
be defined such that
X=2Nitr

and it can be shown (Appendix 1) that


FyX)= q+2-Nr(q+,-

5)

where F(x) is the desired function of X. Table indexes i and i + 1 are adjacent, giving the
function values Y and q+, for values larger and smaller than X. The implementation of
linear interpolation may therefore be carried out with the basic overhead of an addition,
multiply and subtract operation, compared with a simple look-up scheme. Note that no
non-binary division operation or extra storage for slope values is necessary, although the
latter can give a small decrease in execution time [2]. Linear interpolation can give
dramatic reductions in the size of a table for a given error. It is a particularly attractive
proposition when using a processor with very fast multiplication such as a digital signal
processor (DSP) [3,4]. The use of a table generator is particularly likely to be appreciated
in the implementation of interpolated tables of known accuracy. Otherwise, a quantitative assessment of table size and construction may require considerable insight.

5.

Software specification

The following types of table and appropriate look-up algorithm can be generated by the
following software: (a) single table without interpolation; (b) single table with linear
interpolation; (c) partitioned table without interpolation; and (d) partitioned table with
linear interpolation. The system requires a list of X, Y coordinates as the basic input data
source. These data are regarded as a correct table which may be reduced in size.
It was decided that the user should be able to specify maximum errors absolutely so
that the final look-up table system would always give results within the error specified.
This makes methods of equation fitting, such as least squares, which make statistical
assumptions rather unsuitable. Because only linear interpolation is required and because
a best fit is not required (any fit meeting the error specification will do), the standard
numerical methods of interpolation [5] are not appropriate. The need to handle very
large quantities of data makes it inconvenient to compute with a method which requires
all data to be held in main memory. This effectively eliminates multiple scans of the raw
data, if processing time is to be kept reasonably fast. The fact that integral powers of 2
must be used as the index divisors for efficient microprocessor implementation is an
additional difficulty in conceiving an optimal method. Although it is a part of the current
scheme that the user may specify the maximum absolute accuracy of the required

17

A look-up table and source code generator

function, it is possible to foresee occasions when the absolute error need not be constant
across the range of the table. The relative error in F(x) may be required to be constant or
there may be an area of operation where accuracy can be degraded for some practical
reason. The need to handle such situations is an additional reason why a heuristic, rather
than an analytical, solution to the problem of table generation has been sought.

6.

The table generating

algorithm

6.1

Generating a linear approximation

The following method is used to generate an adequate linear approximation

in a single
scan of the raw data.
The user must supply an error value. This is currently a fixed amount by which any
tabulated result may differ from the true value. However, the method does not preclude
different errors from being associated with particular ranges of X or Y if desired. The
user enters the precision which can be represented in the table of the target system. This
will often be one if, as in the examples used here, values are stored as integers in the
target processor system.
A set of data samples is shown in Fig. 2, with their associated permissible and

Figure 2. Linear
final interpolation;

Xk
xk+l 4+2
G+p
interpolation
between samples X, and X,,,. x, Data coordinates
----,
interpolation
limits at sample X,+,+,_,.

X&+,+4-l

%+,+,

with error bounds;

-,

18 R. J. Chance
representable error bounds. An initial table entry is formed from the nearest representable value of F(X), shown in Fig. 2 as data sample X,. Successive samples X,, ,, Xkiz,
etc., are examined and can all be adequately represented by a linear interpolation.
Eventually X,,,,, is reached, which cannot be so represented. It is desirable that a table
index should be obtainable from any function input value in the interpolated range by a
binary right shift. The shift should be as large as possible. Although the interpolation
computed at sample X,+,+,_ 1would be acceptable, it is probable that it would not have
a large power of 2 as a factor. Thus the point X,,, is used. X,,, can be factorized:
X,+,=I2N
where N is the largest possible integer and Z a small integer (the table index).
2Nis the highest such factor in the samples (Xi& < i< k +p) and 2Nis not a factor of any
of the samples (X,,k+p<i<k+p+q).
In the next iteration, X, is replaced by X,,,. In the computation process, it is merely
necessary to keep a record of the best X,,, sample so far. This is updated when a better
sample is encountered. This method does minimize the number of points. However, it
means that there is a good probability that groups of adjacent points will have a high
common factor, which is a power of 2.
Obviously, this scheme is not optimal. One might expect, for example, that the
fortuitous presence of an input value which could be factorized to a large power of 2
would curtail the interpolation prematurely. While this is true, it is hardly ever
catastrophic. If one assumes that the change in the range of adjacent interpolations is
only likely to be a factor of 2, the range of the interpolation can at worst only be halved
by this effect. If a sample is divisible by 2, the distance to the next sample that has 2
as a factor is never less than 2 samples.
6.2

Simple table took-up

The method used for the production of a simple, i.e. not interpolated, table is very
similar to the method used for linear interpolation except that the maximum and
minimum Y values of the input data stream are used rather than the Y/X slopes.

7.

The software implementation

The table generation software is divided into two parts, as shown in Fig. 3. The first [Fig.
3(a)] is the processor-independent section and is a suite of three programs. It reads the
raw data and produces the desired tables, This section is independent of the target
processor and operates on real values of Y, represented in floating point form.
The second part is a code generator [Fig. 3(b)] which constructs assembler source code
for the chosen target processor including the appropriate look-up tables. The prototype
design has been aimed at the TMS320C25 DSP using 16-bit integer representation of
values. Note that the design is not, in principle, limited to such data. The precision with
which data may be represented in the table is supplied by the user, so that there is no
assumption made by the processor-independent section about the final method of data
representation, apart from that specified by the user.

A look-up table and source code generator

19

Raw X,Y data

if

Error

TGENI

Precision

Y
Minimum

X,Y

Simple/interpolated

ltsl

I
Partition

TGEN2

indexes

xy
(single

table)

(a)

TMS320C25
templates

Partitioned/
simple 3
Partitioned
table
e

TGENCES

Jr

(b)

Source code
Figure

3.

generator

7.1

Look-up
section.

table

generator

The processor-independent

software

comprising

(a) processor-independent

section

and (b) code

section

The first stage in table generation (TGENl in Fig. 3) is used to reduce the raw data to as
short a list of coordinates
as possible, together with other essential parameters for table

20

R. J. Chance

generation. The user supplies a maximum tolerable error, the precision of the Y values in
the target system and states whether linear interpolated or simple tables are to be
created, TGENl produces a file of X,Y values representing the minimum number of
coordinates which can be used to generate the required tables. In addition to these
mandatory table values, each shift value is saved (N, where 2NZ=X,+, in Fig. 2) and
maximum and minimum acceptable values of Y (simple) or Y/X slopes (interpolated) for
each coordinate pair. The latter are stored to full precision and may be used to calculate
intermediate table values if necessary. One of the main purposes of TGENl is to reduce
the raw data file, which may be very large, to a smaller size at the earliest opportunity.
The sizes of working files have a large effect on the execution time of the table generator
and a simple table normally takes much longer to create than one using interpolation
because of the larger working files. Although the raw data is basically only read once,
note that the system has to backtrack to the last acceptable coordinates at the end of
each interpolated section, i.e. from point X,,,,,, to X,,, in Fig. 2. A large input data
buffer is used in TGENl to avoid the necessity for unnecessary file handling. TGENl
records a file the smallest shift value, i.e. the maximum X divisor that may be used for a
single table.
The second program, TGEN2, generates a single look-up table by scanning the
coordinates from TGENl. As the X shift value is known, intermediate table entries
which are not output by TGENl (equivalent to points marked with white circles in Fig.
1) are generated by TGENZ. This is done either by linear interpolation from the Y/X
slopes or as a mean of the maximum and minimum Y values for interpolated and simple
tables respectively. In addition, TGEN2 creates another working file to be used in
generating partitioned tables. This consists of the X value at the boundary of each
subtable and the shift value appropriate to that subtable.
The third program, TGEN3, uses the output of TGEN2 to generate a partitioned
table. For each subtable, the X, Y entries are accompanied by the appropriate X divisor
used to obtain the table index in that subtable.
7.2

The code generator

The target processor used to demonstrate the present system is the Texas Instruments
TMS320C25 DSP[l]. The program TGENC25 converts the tables produced by the
processor-independent section into DSP source code. The user has only to select whether
a partitioned or single table is to be generated and to supply a symbol name. The latter is
used to generate symbolic addresses for the assembler, so that many different tables may
be used in one program and so that they may have some mnemonic significance to the
programmer.
One of the advantages of a look-up table system is that the necessary programs are
very simple. In the case of a single, rather than partitioned, table, the code generator has
only to append the table values to a standard source code template in a format suitable
for the assembler (or compiler) concerned. In the case of the TMS320, this is a series of
data statements. The only other information required is the value of the binary shift
used to obtain the table index from the input value. In the case of the TMS320C25 code
generator, the value of this has been inserted as a comment, the shifting being carried out
by the programmer. In this processor, this usually allows the shift to be done in parallel
with another operation with zero time penalty.
The partitioned tables may also conveniently be appended contiguously to the

A look-up table and source code generator

21

appropriate source code template. This means that the last value of one section may be
used as the first value of the next section. The input value is divided by a constant
(shifted) to obtain an index to the subtable selection table which contains the index
divisor and base address information for the appropriate subtable.
To obtain F(X) from a subtable located at address A,, starting at value X, and using an
index division d:
F(x) is stored at (X- X,)/d+ A, = X/d+ (A, - X,/d).
Thus the SST requires two entries, the shift value corresponding to a division by d and
the address (A,- X,/d) if the real-time arithmetic is to be minimized. TGENC25
generates such a table in TMS320C25 source code.

8.

Results

Table 1 shows the execution times and code sizes for the four look-up table schemes used
in the table generator.
Two functions are shown here to demonstrate the operation of the table generator.
The first is a sine function between the values of 0 and 90 degrees. This is used a lot in
DSP work and is a function with a fairly gradual change of slope, which is rather kind to
a system using linear interpolation. The second function is X/( 1 +x2). As may be seen
from Fig. 4, this changes, rather suddenly, from a steep positive gradient to a shallow
negative one. This can be used to demonstrate how the different index divisors possible
with a partitioned table can reduce memory usage. Many functions derived from
practical measurements may be expected to show these inconvenient characteristics. The
functions are scaled for implementation in an integer system.
The sine table results in Table 2 show that the best choice between the different lookup techniques might be quite hard to predict by casual observation. With a maximum
error of 2/30,000 in sine(X), the partitioned table without interpolation is smaller than
the single, interpolated table. If larger errors can be tolerated, the interpolated table is
smaller. The partitioned, interpolated table is more compact when a small error is
required but, with this particular function, is comparable to a single, interpolated table
at larger errors, due to the subtable selection table overhead. Interpolation is most
effective if table entries use a precision greater than the required error. When a greater
precision can be used, even more impressive results can be obtained. For example, a sine
table design for the 24-bit word of the Motorola DSP56001 produced a partitioned,
interpolated table size of 135 words for a precision of 1 part in 65,536.

Table 1.

TMS32OC25 execution times for four methods of table look-up

Single table, no interpolation


Partitioned table, no interpolation
Single table, linear interpolation
Partitioned table, linear interpolation

Code size
(words)

Execution time ps
(at 40 MHz)

5
14
25
35

0.6
2.4
2-5
4.1

22

R. J. Chance
32

16
I

4
I

4
I

4
I

4
,

2
1

I
I,

I
I

I
I

16
X

Figure 4. The function X/(1 +X) from X=0 to X= 16 showing the sizes of 16 subtables generated for a
maximum error in Y of 8/30,000.

Table 2.

Sizes (words)

of partitioned (16) and single tables. The subtable selection


table adds 32 further words

Specified error
Partitioned table (16) without interpolation
Single table with interpolation
Partitioned table (16) with interpolation

11,905
16,385
1201

5931
1025
113

2977
65
31

Y= 30,000 sine(360X/65,536).

Table 3 shows the results of tabulating the A/(1 + J?) function (appropriately scaled)
shown in Fig. 4. There is a satisfactory reduction in table size produced by interpolation
and partitioning but one can see more clearly what is happening from the histograms of
the partitioned tables shown in Fig. 5. The non-interpolated table, shown in Fig. 5(b),
has the largest number of points in the first subtable where the gradient is highest. The
partitioning brings dividends where the gradient is smallest, when X is much larger than
1. Contrast the interpolated and partitioned table of Fig. 5(a). In this case the steep
gradient, when Xis small, is quite efficiently described by the linear interpolation and the
first subtable is no longer the largest. It can be clearly seen from Fig. 5(a) why a single
table, using linear interpolation, is not very effective for this particular function; the area
represented by the second subtable with 64 points determines the single index divisor.

9.

Testing the algorithms

It is particularly desirable to test the products of a scheme which is suboptimal and based

A look-up table and source code generator


Table 3.
Single table
Partitioned
Single table
Partitioned
Y-k,;

Sizes of partitioned (32) and single tables

without interpolation
table (32) without interpolation
with interpolation
table (32) with interpolation

k,X

___
I+ (k&

23

49,000
2849
2049
130

from X=0 to X= 65,536 maximum error in Y= 8.

Scaling factors: k, = 30,000, k, = 13/65,536.

(b)
31
Subtable

no.

Figure 5. Partitioned tables: subtable sizes for X/(1 +X);


non-interpolated tables.

(a) using linear interpolation

and (b) using

on heuristic methods. The look-up table algorithms


produced by the above program
generators have been tested with the aid of a DSP simulator. This simulator may be
extended [6] by the linking of software modules written by the user. These may represent
hardware DSP peripherals
or, alternatively,
algorithms
which are not realizable in
hardware but can be used for test purposes [7]. In the present case, the TMS32OC25
simulator was extended as shown in Fig. 6. The method has been used for testing the

24

R. J. Chance
Raw doto

DSP simulator

Peripheral

y-Y

out
Y

Table look-up

+ y

V
Errors

Figure 6.

Simulator-based test bed comparing the DSP table look-up result with the raw data values.

accuracy of many DSP algorithms. The basic scheme uses a C function to compare the
results of a DSP algorithm with a perfect solution. In this case, the X values of the
original file of X, Y coordinates is used as an input to the DSP look-up algorithm. The Y
values of the look-up scheme and the original data are compared in a simulated
peripheral, and excessive errors noted.
This method was used to confirm the accuracy of the look-up tables and algorithms
over the entire input function range in a large number of cases. The error output from
this scheme produced convincing demonstrations of interpolations taking advantage of
the value set by the user. The error normally rose to the maximum allowed quite quickly
and then remained unaltered.

10.

Conclusions

A look-up table and microprocessor source code generator have been demonstrated.
These use linear interpolation and the partitioning of one into several tables to reduce
table size, while achieving a worst-case error demanded by the user.
These methods are particularly efficient when implemented on DSP systems. The
TMS320C25 DSP has been used to demonstrate the performance of the system but the
tables are generated in a form independent of the target processor and others could
easily be used. The philosophy behind the system is that tables and code are so simple to
generate once the desired function has been defined that they may be generated
experimentally while in the design stage of a project. This has already proved to be of
value in specifying the memory for prototype hardware, even when a code generator had
not been written for the intended processor.
The most noteworthy reductions in table sizes have been where linear interpolation is
used. The speed penalty for such interpolation is often acceptable when using a
processor with a fast multiply operation. The table generator is particularly helpful here

A look-up table and source code generator

25

as it is not always simple to guess the quantitative


performance
of such systems. This
becomes even more difficult if function errors can be allowed to vary over the range of
interest.

References
1. Texas Instruments 1989. Second Generation TMS320 Users Guide.
2. D. Garcia 1986. Precision digital sine-wave generation with the TMS32010. Digital Signal
Processing Applications with the TMS302 Family. Texas Instruments, pp. 269-289.
3. A. Bateman & W. Yates 1988. Digital Signal Processing Design. Pitman Publishing.
4. Motorola 1988. Digital sine-wave synthesis using the DSP56001. Motorola Semiconductor
Products Group Document APRl RI/D.
5. G. de Vahl Davis 1986. Numerical Methods in Engineering and Science. Allen & Unwin,
London.
6. R. J. Chance & B.S. Jones 1987. A combined software/hardware development tool for the
TMS32020 digital signal processor. Journal of Microcomputer Applications, 10,179-197.
7. R. J. Chance 1988. A system for the verification of DSP simulation by comparison with the
hardware. Microprocessors and Microsystems, 12,497-503.

Appendix

Linear interpolation

A value x is divided by 2N, where N is an integer, to obtain a table index i and a fractional part r
x=r+2Ni.

(Al)

Two adjacent table entries at indexes i and i + 1 are shown graphically in Fig. Al giving the desired
function values for xj [ = 251 and xj+ 1 [ = 2N(i+ l)], which are y, and yj+, respectively. Assuming
linear interpolation between these points
2-Nx- i
-=
(i+l)-i

I=

Figure Al.

Linear

interpolation

F(x)-

y,

Y,+,-Y,

z-XNi
between

2?

i+l=2-Tj+,

table entries i and i+

1.

26

R. J. Chance

From equation (Al)

therefore r;(x) =y,+ 2-V(y,+, -yj).

Jim Chance is currently a lecturer in the School of Electronic and


Electrical Engineering at the University of Birmingham. He first became
involved in computing in 1961 and has specialized in the design of
microprocessor applications and development tools since 1976 in the
Biochemistry Department, the Microprocessor Systems Laboratory and
his present department at Birmingham. Since 1984 the use of digital
signal processors in teaching, research and industrial contracts has been
a major interest. This work includes the use of simulation in program
development, testing and validation with DSPs and the use of DSPs in
PWM waveform generation for high-power locomotive traction applications. Jim Chance is a member of the British Computer Society.

Potrebbero piacerti anche