Sei sulla pagina 1di 9

Implementation of edge detection on FPGA

Abin Thomas Mathew Authors Name/s per 2nd Affiliation (Author)


Intelligent Transportation System line 1 (of Affiliation): dept. name of organization
Alpen-Adria University, Klagenfurt line 2—name of organization, acronyms acceptable
Austria. line 3—City, Country
athomasm@edu.uni-klu.ac.at line 4—e-mail address if desired

Abstract— Edge detection is a computer vision algorithm that The first question which might arise is; why FPGA, when
is very processor intensive. It is possible to increase the speed of dedicated DSPs are available to do the same work? The
the algorithm by using hardware parallelism. This paper answer to this question lies in the time taken to perform the
presents an implementation of edge detection in an FPGA, the same task on a DSP and the FPGA. The first and foremost
Altera development kit. The paper focuses on providing the often advantage for the FPGA is that all the processes and
missing link from the algorithm development to the FPGA instructions are executed in parallel which those in DSP are
implementation. The implementation is a co-operative work executed in sequence. Some of the processes can be assigned
between multiple software tools. The Canny filter with the to work in parallel but, not the entire stretch of code.
Prewitt kernel is the co-processor to Nios 2 based system for the
The main objectives of this internship are
task mentioned above. In addition the testing software and
firmware development is described. The results show how a  Learn and work with the software tools like DSP Builder3
highly parallel algorithm can run faster on a 50 MHz FPGA then in MATLAB, SOPC builder and Nios 2 processor in
a modern PC in the GHz.. (Abstract) Quartus 2 development suite and Nios 2 embedded
development suite.
Keywords—Edge detection, canny filter, prewitt kernel, SOPC
builder, Nios II processor, DSP builder, Altera Cyclone 2 FPGA
 Implement the two dimensional filter with Prewitt kernel
in Simulink using DSP builder
I. INTRODUCTION  Create a system using NIOS II/s processor for a standard
Edge detection is a fundamental process used for image application
processing application which acts as a precursor to extract  Merge the Prewitt co-processor into the system
information from the frames, later used for feature extraction
 Execute the software codes for the system built using
and object segmentation. It is also used to remove the blur in
Nios 2 embedded development suite
noisy images. Edge detection is seen in most of the application
related to image processing. The simulation of this algorithm
has been performed using most of the mathematical tools and Real-time video and image processing is used in a wide
programmed using various languages. The hardware variety of applications from video surveillance and traffic
implementation of the same was performed on GPP, DSP and management to medical imaging applications. These
microcontrollers. The results from these experiments have operations typically require very high computation power.
proven to be quite fast compared to the simulation, but it is not Standard definition NTSC video is digitized at 720x480 or full
fast enough. D1 resolution at 30 frames per second, which results in a
FPGA or reprogrammable computation device is a boon to 31MHz pixel rate. With multiple adaptive convolution stages
engineers in various fields and image processing is one of to detect or eliminate different features within the image, the
these fields of research. The work seems to be much easier filtering operation receives input data at a rate of over 1 giga
when the project is implemented on a FPGA when compared samples per second. Coupled with new high-resolution
to various contemporary computational devices. A specific standards and multi-channel environments, processing
task will faster on a FPGA running on a 50MHz clock than a requirements can be even higher. Achieving this level of
PC which works on a clock in GHz [1]. Edge detection is processing power using programmable DSP requires multiple
implemented on an FPGA with the help of Simulink1 for processors. A single FPGA with an embedded soft processor4
modeling the Prewitt Kernel co-processor and SOPC builder2
which will create a soft-core processor.
development tool used to design and generate a System On a
1
Simulink is an environment for multidomain simulation Programmable Chip.
3
and Model-Based Design for dynamic and embedded DSP Builder, SOPC builder, quartus 2 and Nios 2
systems. It is a product of the The Mathwork family development suites are software tools from Altera
Corporation.
2 4
SOPC Builder is a product from Altera Corp, is a pioneer in An embedded soft processor is a reconfigurable processor
programmable logic solutions, which is used for system which lies on the FPGA fabric
can deliver the requisite level of computing power more cost- components have their own trade off and some of them are
effectively, while simplifying board complexity. mentioned in the forthcoming sections.
Real-time image processing requires high computation
A. Digital Signal Processor (DSP):
power. For example, the NTSC video standard requires
30 frames per second, with approximately .25 mega A digital signal processor, or DSP, is similar to a general-
pixels per frame, resulting in a total of 7.5 mega pixels purpose processor (GPP) in many aspects. It has fixed logic,
per second. PAL video standard has a similar that is, the connections between logic gates cannot be
processing load with 25 frames per second, but the changed. It provides a fixed instruction set (ISA) to the
frame size is larger. The amount of processing required programmer, and it expects a program in this ISA that it will
then execute in a sequential manner (as opposed to dataflow-
per pixel depends on the image processing algorithm.
driven) (Kisacanin, Bhattacharyya, & Chai, 2008). Most DSP
In the case of the Canny edge-detection algorithm the
ISAs exhibit a similar structure as GPP ISAs, complete with
processing requirement is a combination of the four arithmetic and logic instructions, memory access, registers,
stages of Gaussian smoothing for noise reduction, control flow, and so on. Distinguishing it from general-
finding zero crossings using the derivative of Gaussian, purpose CPUs, a DSP‘s instruction set is optimized for matrix
non-maximal suppression to thin the edges, and finally operations, particularly multiplication and accumulation
hysteresis thresholding. (MAC), traditionally in fixed-point arithmetic, but
increasingly also for double-precision floating point
Depending on the size of the kernels, the processing arithmetic. DSPs exhibit deep pipelining and thus expect a
requirements for the first two convolution stages can very linear program flow with infrequent conditional jumps.
change. Using the assumption of a kernel size of 7*7 for They provide for SIMD (single instruction, multiple data)
the Gaussian and 5*5 for the derivative of the Gaussian, instructions, assuming a large amount of data that has to be
the two stages require approximately 62 operations per processed by the same, relatively simple, mathematical
pixel. This implementation requires less complexity than program. SIMD programs exploit instruction-level
most typical 2-D convolution processes because of the parallelism, executing the exact same instruction
symmetric and separable characteristics of the Gaussian simultaneously on multiple data. VLIW (very long instruction
masks. The non-maximal suppression stage requires 27 word) relaxes this constraint by allowing different instructions
operations per pixel. Due to the recursive nature of the (op-codes) to be packed together in a VLIW, and every
hysteresis thresholding process, this stage requires an instruction therein processes a different data concurrently.
additional 40 operations per pixel. In total, the Canny Many DSPs are VLIW architectures. The types of instructions
algorithm requires approximately 130 operations per that are allowed together within one VLIW (and thus will be
pixel. Coupled with running at multiple resolutions, serial executed in parallel) depend on the function units that can
operate in parallel. For example, if a DSP has two fixed-point
implementation of the algorithm using a programmable
MAC units and two floating point MAC units, then at most
DSP with a clock speed of 600MHz can only process 4.6
two fixed-point MAC operations can be placed into the same
mega pixels per second. This is not sufficient to support VLIW. This constraint is relaxed even further in so-called
real-time video streams at high resolution. As high- MIMD machines (multiple instruction, multiple data), where
resolution video standards become more prevalent, the multiple identical processors can independently execute
processing requirements will increase even further. The arbitrary instructions on non-dependent data. You might note
high-resolution video standards typically have 4 times that modern CPUs and their multiple-dispatch (superscalar)
more pixels per frame. The computation load, therefore pipelines do exactly that—schedule multiple instructions
is approximately 4 times higher. These image concurrently. With DSPs, however, there is no such intelligent
processing applications with high computation loads can pipeline. Instead, the burden of scheduling is on the compiler:
be implemented with multiple DSPs or a single, very it has to co-schedule instructions for independent data
expensive high-end DSP. In this scenario, FPGAs offer operations and optimize the packing of instructions in width
an alternative real-time image processing platform. An (e.g., four instructions per word) and in sequence (control
FPGA efficiently supports high levels of parallel- flow). DSPs do not perform such complex CPU operations as
processing data-flow structures, which are important for branch prediction or instruction reordering. Here, too, the
efficient implementation of image processing algorithms. compiler has to perform the optimizations. DSP programs are
relatively small programs (tens or hundreds of LOC), with few
branch and control instructions, as opposed to entire operating
II. HARDWARE COMPONENTS systems running on general-purpose CPUs. Frequently, a
single, tight, and heavily optimized loop is executed once for
every data element or set thereof. Fig. 1.6 shows how the
The previous section provided a brief insight on the serialized pixel data is streamed through a DSP with a simple
reasons to why an FPGA is used over a DSP for edge one-dimensional smoothing filter.
detection. This chapter starts with an introduction to both the
Since DSPs usually execute small programs on huge amounts
hardware components. The discussion continues with DSP and
or endless streams of data, these two pieces of information are
FPGA being widely used in most of the fields of engineering
stored in separate memory blocks, often accessible through
for its ease of use and computation speed. Both these
separate buses. This is called Harvard architecture, as PowerPC, ARM, or a DSP architecture. Other common hard
opposed to the GPP‘s von Neumann architecture, in which and soft modules include multipliers, interface logic, and
both program and data are stored in the same memory. memory blocks. The logic design determines the FPGA‘s
Because the program does not change (firmware!), many functionality. This configuration is written to the device and is
DSPs provide on-chip ROM (typically in the order of 10 kB) retained until it is erased. To be precise, there are three types
for program storage, and a small but efficient RAM hierarchy of FGPAs: antifuse, SRAM, and FLASH. Antifuse chips are
for data storage. Frequently, an embedded system also not reprogrammable. FLASH (EPROM) is also nonvolatile,
includes a separate non-volatile memory chip such as an meaning that the logic design stays on the chip through power
EEPROM or flash memory. There are several high- cycles. It can be erased and reprogrammed many times.
performance DSPs available, including several that have SRAM programming on the other hand is volatile; it has to be
multi-core DSP-with-general-purpose-CPU system on a chip programmed at power on. The huge benefit of an FPGA is the
structure. Of particular interest are the DSPs with great flexibility in logic, offering extreme parallelism in data
specialization toward video and image processing, also known flow and processing to vision applications. One can, for
as media processors. These tend to include multiple (perhaps example, create 320 parallel accumulation buffers and ALUs,
as many as 64) enhanced DMA units and multiple dedicated summing up an entire 320×240 image in 240 clock cycles.
I/O streams adapted toward the movement of pixels onto and Another example would be to place a region of interest in the
off the chip. Media processors are a common choice for video FPGA and then perform pixel operations on the entire region
applications owing to characteristics that make them equally simultaneously (see Fig. 1.7). FPGAs can achieve speeds
attractive for embedded vision systems: programmability, close to DSPs and ASICs, require a bit more power than an
direct memory access (DMA) architectures, some level of ASIC, have much lower non-recurring engineering (NRE)
parallelism (VLIW or SIMD), low power and low cost. costs, but higher volume prices than ASICs.
Example vision systems using DSPs are discussed in Chapters Algorithm developers and software engineers are usually
4 and 9.Manufacturers of DSPs include Agere Systems,
trained on a sequential model. That and the flexibility of logic
Analog Devices, Infineon, Lucent Technologies, Equator circuitry make parallel designs on FPGAs a challenging task,
Technologies, Freescale Semiconductor (formerly part of particularly because the best implementation is often not
Motorola), NXP (formerly part of Philips Electronics), Texas
intuitively apparent. One of the first difficulties is dividing the
Instruments, and Zilog. Most of these manufacturers also offer responsibilities between the FPGA and a GPP, and between
DSP development boards specific for image processing and the FPGA‘s core CPU and into possible other chips on the
computer vision, complete with the required auxiliary chips,
platform. Common ―hardware description languages‖ for logic
with cameras and software. Those are an ideal starting point design are Verilog and VHDL and the topic of many
for an experimental embedded vision system. Bruno Paillard engineering courses and books. FPGAs are a great resource
wrote a good introduction to DSPs that can be found at [12]. A
for parallelism and offer tremendous flexibility in the
good textbook resource is Lynn and Fuerst‘s Introductory embedded vision processing system. On the other hand, large
Digital Signal Processing with Computer Applications. The FPGAs are quite power hungry and their clock rates are lower
USENET group comp.dsp might also be of interest to the
than a typical DSPs‘ clock rate. A wide variety of field-
reader. programmable array chips are available on the commercial
market today. The optimal choice for an embedded vision
B. Field Programmable Gate Array (FPGA): system is a combination of a single FPGA with sufficient
A field-programmable gate array, or FPGA, is a general-purpose I/O resources to handle the imager‘s
semiconductor in which the actual logic circuit can be incoming and outgoing interfaces plus a 64-bit interface
modified to the application builder‘s needs. The chip is a to/from the DSP. An equally important selection criterion will
relatively inexpensive, off-the-shelf device that can be be the amount of embedded FPGA memory as well as the
programmed in the ―field‖ and not the semiconductor package. Most FPGAs have an abundance of I/O capability as
fabrication. It is important to note the difference between compared with internal logic so it is probable that an optional
software programming and logic programming, or logic 32-bit SDRAM interface may also be possible. Such an
design as it is usually called: a software program always needs interface in an embedded vision system would provide the
to run on some microcontroller with an appropriate instruction FPGA with private access to its own storage area at the cost of
set architecture (ISA), whereas a logic program is the access time and added FPGA complexity. The plentiful I/O
microcontroller. In fact, this logic program can specify a resources are also used to let FPGAs control the input gates of
controller that accepts as input a particular ISA, for example, other onboard chips. Example vision systems using FPGAs
the ISA of an ARM CPU, effectively turning the FPGA into are discussed in Chapters 6 and 7. FPGA manufacturers
an ARM CPU. include Achronix semiconductor, Actel, Altera, AMI
This is a so-called soft core, built from general-purpose Semiconductor, Atmel, Cypress Semiconductor, Lattice
logic blocks. These soft cores, or better the right to use the Semiconductor, QuickLogic, and Xilinx. Most of these
intellectual property, can be purchased from companies such manufacturers also offer FPGA development boards specific
as Xilinx, Inc., and Altera Corporation. They are then for image processing and computer vision, usually with at
―downloaded‖ to the FPGA where they implement the desired least one DSP onboard, and complete with cameras and
functionality. Some of the modern FPGAs integrate platform- software. Those are an ideal starting point for
or hard multipurpose processors on the logic such as a experimentation.
Both these devices are good for specific functions and The mentioned illustration shows the change of intensity from
hence when used practically for a particular application, we pixel to pixel. The first row in the table gives the intensity
would be able to see some trade-offs. Some of them are level at each pixel. Between the first and the second pixel
mentioned below: though there is a change in intensity, it is not that noticeable.
This becomes a criterion when setting the threshold for
Fig: Illustration of edge in an image 1 detecting an edge. The intensity at first four pixels seems to be
 Choice between the two is very dependent on the context quite similar due to a very minute change of intensity. We can
of usage clearly see the change of intensity between the fourth and fifth
pixel. The values just above them show the difference too.
 In case power is a bit of constraint, then DSP is This contrast gives us an edge in an image.
preferred despite all the overhead of pain in
programming Edge detection is the fundamental tool which is used in any
image processing applications and is used to extract
 Besides the I/O stuff that you mentioned, DSPs are not information from the gray level image and the information is
so bad :) later used for feature extraction or object segmentation
 In case you do not have much of power constraints, processes. The edge detection algorithms were classically
like you want to do an FFT on an image put on a implemented on software. The advancements in the VLSI
security camera plugged on to a socket, then FPGA by technology have made it an alternative.
all means is a wise choice. The Canny edge detector is the most standard detector that is
 P+R on FPGA is stochastic! used for edge detection because it provides sharp and thin
edges. The Canny edge detector is a multiple step detector and
 This is very bad in a real time system, where we need the steps are
to ensure that every place to try to program it comes
the same way Filtering
 Embedded Systems Differentiation
 Well here, we also need that we lower power and very Detection
low overhead for reconfiguration (in case of FPGA)
and loading a new program (in case of a DSP). A. Filter
 This gives FPGA an advantage market-wise The filtering stage is used to remove the noise in the
image. The noise is generated due to the undesirable effects of
sampling, quantization, blurring and defocusing of the camera
and the irregularities of the surface of the object. A linear filter
is used to remove the noise with the help of a Gaussian kernel.
III. EDGE DETECTION The filter operation averages the noise in the neighborhood
and this operation is a convolution of the input image and the
Edge in a layman‘s term is a change of altitude or height or filter. The equations representing the filter operation is given
an angle like the edge of a table. In terms of images the edge below
will be defined in terms of intensity. A slight change in the
intensity of light from one pixel to the other will be an edge.
The intensity of light in a pixel is represented with a value in
the range 0-255, where 0 will represent ―Black‖ and 255 will
represent ―White‖. Hence edges are seen at locations where
there is an intensity contrast in an image. An edge signifies the
boundary of an object.

5 7 6 4 152 142 146 147

Fig: Illustration of edge in an image 2

In the equations mentioned above ‗h’ represents the


filtered image, ‗f‘ the input image and ‗g‘ the filter. The *
represent the convolution operation which is seen in the last
equation.
B. Differentiation The system built using NIOS II5 to work synchronous
The general equation for differentiation is with the co-processor
System model execution codes written in C for both the
hardware and the software
The edge detection design will finally will be in the format
The images are discrete and hence the minimum value of as shown below. Except for the co-processor design all the
∇x is 1 and hence the derivate becomes components that are included in the design will be part of
NIOS system and this will work along with the co-processor
to perform the task of edge detection. The SOPC builder will
be the tool which will be used to create the system, which is a
The task that I had been working on involves gradient tool in Quartus II for rapid prototyping
operators for the above mentioned task. The differentiation
process can also be worked out using a mask, similar to the
one used for filtering. Some of the gradient operators are
Roberts, Prewitt, Sobel and these operators have been
extensively been used in the past. Roberts and Prewitt
operators have a equal weights which are assigned to the
pixels whereas Sobel uses different weights as shown below.

Block diagram of edge detection system design

Prewitt Operator
The input image will be a 640 x 480 image which will be
fed into the system through a compact flash, the data is passed
to the buffer which translated the pixels into arrays and fed to
the co-processor for processing and then again transmitted
back to the buffer which will be stored in the SRAM. The
edge detected image that is stored in the SRAM is then pushed
to the video lines for the display. Each of these components
will be explained in detail in the sections below.
Sobel Operator
A. Prewitt kernel co-processor
The Prewitt kernel as mentioned in the chapter on Edge
detection, is Implemented in Simulink with the help of DSP
builder6 toolbox. Since this is the prototype that is being
developed, the measures taken to clear the noise of the image
Robert's Operator is removed and hence we will start with the matrix gradient
kernel for canny detector.
As mentioned in the introductory chapter, the Prewitt The kernel is a 3 X 3 matrix and this is the mask that will
operator is used for the Co-Processor. Though Prewitt is not be put on the image to detect an edge in the pixels. The
the best operator that is available to solve the problem, we still hardware implementation for doing this is shown in fig ()
use it during this stage of the work due to its simplicity for the which uses the line buffers to create a delay in the lines for the
work. More of the weights are equal and unity which makes it incoming pixels. The array lines are fed with data
easier to compute the values of the pixels. In the next section, simultaneously with a delay and pixel values are multiplied
we will see how the Prewitt kernel for the canny filter is with the respective co-efficient at the node as seen the figure.
implemented using Simulink with DSP builder toolbox.

IV. IMPLEMENTATION OF THE ALGORITHM 5


NIOS II is a softcore embedded processor from Altera, which
can be used to emulate a processor for a logical system based
The implementation of the edge detection algorithm using on the requirements for an application.
Prewitt Kernel involves both software and hardware tasks. 6
DSP builder is a product from the Altera Corporation which
These tasks are further divided into three phases namely
is used as a toolbox with Matlab for modeling systems meant
The Prewitt Kernel Co-processor to be implemented on Altera FPGA development kits.
All the video lines are then fed to an adder which will produce The interface between the compact flash and the co-
the middle point resultant pixel. processor takes places through the Avalon7 interface
component which will be used in Simulink as a black box. The
Avalon interface component can be either added using the
DSP Builder toolbox or the HDL code can be imbibed using
the black box toolbox in Simulink. The model that is built in
Simulink is shown in fig ()

Implementation of 2D non symmetrical filter

This filter is then used for the implementation of the


vertical and horizontal filters. The filter designs for the two
are the same except for the data that flows into the array. The
block diagram in fig () shows how the vertical and horizontal
filters are merged together to obtain the combined threshold
values and this will produce the binary image with the edge Prewitt Kernel Co-processor
detection.

The Prewitt block is the small block that is seen in the


figure above. It is a block which is used for modeling the
co-processor. The Prewitt edge detection model is as
shown in the figure below.

Block diagram for the implementation of canny filter Prewitt edge detection

The input and output that is seen in this model are


reflected back to edge in and edge out scripts that run in

7
Avalon interface is a standard interface seen in SOPC builder
which is used as interfaces for high-speed streaming and
memory mapped applications.
MATLAB. The edge in script is used to convert a real image Exception controller
into an image in gray scale and convert the two dimensional
Interrupt controller
image into a linear array of data. The vice versa happens in the
edge out script. These two files were used during the Instruction bus
simulation stage. This helped to check if the algorithm was Data bus
working. But in the co-processor model, the data in is from the
compact flash which is buffered at the DDR2 SDRAM and Memory management unit (MMU)
similarly at the output it is again buffered and stored in the Memory protection unit (MPU)
SRAM. The data from the SRAM is sent out to the VGA
video channels. Instruction and data cache memories
Tightly-coupled memory interfaces for instructions and
B. Building the system data
Once the co-processor is built, it is the turn of the system JTAG debug module
to be built. The system is built with NIOS 2 a RISC core
processor. We used the economy8 version of the processor to
work as the CPU for the system. From the block diagram as
seen in fig (), we see that the DMA helps the system in
transferring the data from the flash memory and also store the
data to the SRAM. This is done to reduce the burden on the
processor itself, the definition of DMA. The step wise process
on how to build and configure a system is as shown below.
1) System
Any system will have the following basic components
 A microprocessor for a processor based system (NIOS II)
Memory to store data or instructions (SRAM or
SDRAM)
JTAG to transfer data from the host machine into the
target device (Development workstation to DE2
Development board)
Other devices based on the application (VGA)

a)NIOS 2 Processor: Block diagram of NIOS 2 processor core

Nios II processor is a 32 bit soft core processor provided Each of the functional units mentioned above can be used
by the Altera. The Nios II processor has Reduced Instruction based on the objective to be achieved. The NIOS II processor
Set Computer (RISC) architecture and the architecture uses has a custom set of instructions which can be used for specific
separate instruction and data buses, which is often referred to implementations and also supports user defined instruction
as the Harvard architecture. The Nios II architecture supports which will connect the NIOS II ALU to the user defined logic
a flat register file, consisting of thirty two 32-bit general- to perform the implementation.
purpose integer registers and six 32-bit control registers. Nios
II processor cores can address a 31-bit address span. The user C. Instantiating a NIOS II processor and building a system
must assign base address between 0x00000000 and in SOPC builder
0x7FFFFFFF. System On a Programmable Chip is the current technique
The figure below shows the block diagram of the NIOS II used for prototyping applications for every field of
processor and some of the functional unit seen in the block engineering. This technique is faster, cheaper and is the most
diagram are efficient form of testing a system during the development
phase. This development tool is used to specify, configure and
Register file
generate the embedded system using graphical user interface.
Arithmetic logic unit (ALU) The software development tasks can be accomplished using
Interface to custom instruction logic the Eclipse based development suite called NIOS II embedded
development suite or NIOS II studio.
8 A SOPC file is built in a Quartus 2 project file. This opens
NIOS II processor comes in three different formats NIOS II/e a window with all the components which can be included into
(economy), NIOS II/f (fast), NIOS II/s (standard)
a system. The Nios 2 processor being the heart of the system generating a PLL and clock based on the user needs can be
is considered the CPU for the system. Representative names generated with this.
can be provided to the system based on the application that is The final component which will connect the host pc to the
being developed. target board has to be activated too. It is a serial
Nios 2 processors have three possible standards based on communication port using a JTAG. This component can be
the kind of system it is used in and they are seen under the Interfaces menu. The other components which
are the peripherals can called either by using PIO components
NIOS II/s (standard)
or if the university program DE2 board is installed, then we
NIOS II/e (economy) have build in library for all the peripherals on the board. This
NIOS II/f (fast) will include the drivers for VGA, LED, seven segment display
and Ethernet port.
Direct Memory Access (DMA) is used to give complete
Nios II/f Fast CPU (Adler) – Optimized for maximum control of the data bus to the DMA unit for bulk data transfer
performance, the Nios II/f processor delivers up to 220 from or to the memory. DMA can be activated by calling the
DMIPS performance in the Stratix II family of high- required functions or the component from the library.
performance FPGAs, placing it squarely in the ARM 9 class
of processor. While this core is 4 times faster than the original
Nios CPU, it is 40% smaller as well.
D. Co-processor in the system
The prewitt edge detection design as shown in the section
Nios II/e Economy CPU – Optimized for lowest cost, the above, is converted into a HDL code using the signal
/e core achieves a smaller FPGA footprint (less than 600 LEs), converter which is a component of DSP builder. This tool will
consuming as little as $.35 worth of logic in a Cyclone II convert the model in Simulink into a HDL for the target
device. It is half the size of the smallest Nios core and 4 times device specified. The HDL code that is generated is called into
the performance. the system using SOPC builder as a component. Parameters in
general are not available for this component, but it depends on
the designer.
Nios II/s Standard CPU – This core balances processing
performance and logic element usage. It is 0% faster than the E. Nios 2 embedded development suite
fastest Nios CPU and smaller than the smallest Nios CPU, and The embedded development suite also called the Nios 2
it achieves over 120 DMIPS while consuming only 930 LEs Studio is used to write software codes for the hardware that is
(Stratix II).
generated. The development suite is built on eclipse and hence
a couple of parameters has to be set before we start working
with this tool. Some of the features that can be seen with Nios
A detailed table with the possible features and the
difference among the processors can be seen in the appendix 2 EDS are
[]. For the task that we are working on, it was suitable for us to  2x faster compile/build times as compared to Nios II
choose the standard processor. Once the main task of choosing IDE
the data is done, it is now the turn to choose the memory. The  Simple file system based project management
possible memories that are available are the on-chip memory
(RAM/ROM) (variable), SRAM (512KB) and an SDRAM (8 o Supports relative paths for easy copy, move
MB). The on-chip memory as mentioned in the chapter on of sources
Hardware components are the flip flops which can be used as o Version control compatible
memory units. These memory blocks are also called L2K
memory block. When defining these blocks, the size and o Easy to read/manage makefiles
length of the data that can be stored into it can also be set,  GUI based generation of Board Support Packages
which make it a flexible memory. The other memories are the (BSP) featuring
SRAM and SDRAM memory which are fixed to the memory
size, but can vary the length of the data that we store into it. o A menu of HAL-based operating systems
We will use the IP library files to evoke the memory. o Software packages (network stack, files
The next component in list for the building is the system systems, graphics libraries etc.)
timer or the clock to synchronize the movement of data. We o Support for custom device drivers
will use the component called ―internal timer‖ for this
purpose. Each of the components which are used for the o Support for multiple versions of a device
system can be renamed based on the specific function it driver
performs in a system. There is another clock which is used in o Full control over build options and OS
the system. This is the clock from an external source. The settings
board that we use has 50 MHz and 27 MHz oscillators which
can be used to generate the clocks. The other possibility is o Linker memory map management
 Full compatibility with command line tools for  One of the major reasons was the lack of subscription
scripting and automated regression testing licenses for the software tools that we were working
on, which hindered the work to a large extent.
 Nios 2 Embedded development suite, working on
The .sopc or .ptf file that is generated from the SOPC
builder can be used here as the CPU. An alternative to this is eclipse had issues of configuration, which seemed
to call the Nios 2 EDS from the SOPC builder, this will use very odd. Every CPU that was generated with the
custom co-processor could not be processed and
the current sopc file as the cpu and the program that has to be
run on this cpu can be run. As mentioned above the script can errors due to this made the system fail for
be run based on our requirement. It is also possible to test the implementation.
code by simulating it on the cpu on the host PC. This enables  Lack of updated documentation for the work flow of
the designer to debug the code even without porting it onto the the software tools was indeed a drawback of its kind.
FPGA.
For programmers who prefer to use command prompt for
testing the code can use the Nios command shell on command
prompt and build and execute codes from the command
ACKNOWLEDGMENT (HEADING 5)
prompt. For beginners, it is advisable to use the GUI as it quite
user friendly and since it is opensource, the possibility of I would like to thank Univ.-Prof. Dr.-Ing. Kyandoghere
obtaining help and solutions for problems regarding the Kyamakya for furnishing us with all the resources that we
configuration of the software is better. needed for the research task. I was very happy to be part of the
team working with a big goal in mind. I would like to thank
Mr. Koteswara Rao Anne for constant technical and moral
help throughout the various phases of my work. A great
V. RESULTS amount of the stress passes away when we have a support of
The simulation of the edge detect on Simulink provided this sort. I would like to thank Mr. Singanamallu Gupta and
with good results before the code were used in the system. The Mrs. Deepthi Rao for their support and help during the course
results show that, the edges are detected with the codes that of my work. I am extremely happy to be part of the Intelligent
were created. For the edge detection, any real image is Transportation system research group. My colleagues Mr.
converted into gray scale before it is used and is also brought Sreeram Kumar Bhagavatula, Mr. Vamsi Prakash Makkapati
to the resolution 600*480 before fed to the co-processor for and Mr. Venkata Sai Dattu Potapragada were a great support
further processing. This system was meant to be a prototype and it was great fun to work along with, their innovative talks
and hence not much importance was given to the accuracy and and work fashion especially during late nights provided me
the optimization methods. When we use the image, we do care the energy to work a lot better. Last but not the least all my
about the noise in the image and hence the results were not family whose constant prayers and wishes helped me thorough
that good when compared to the software simulation results. out my work.
Due to lots of undetermined reasons, the hardware test
failed. A couple of problems that we faced during the course REFERENCES
of our work were solved but a few problems did not get apt
solutions to give it a final result. Some of the problems that we
faced are:
[1] Mubbarak Shah, Fundamentals of Computer vision, University of
 The HDL codes that were converted from Simulink Central Florida, 1997
with the help of DSP builder could not be analyzed [2] Mahmoodi A, Kraut J, Jahns M and Mahmood S, ―Hardware edge
and hence the code had to be used with the system detection using an Altera Stratix Nios 2 development kit‖, Digital ASIC
just as it was. design, December 2005
[3] Forsyth, D., & Ponce, J. (2002). Computer Vision: A Modern Approach.
 Intellectual Property (IP) library functions for all the Alexandria, VA: Prentice Hall.
components were not available and hence some of [4] Shapiro, L., Shapiro, L., Stockman, G., & Stockman, G.
the components could not be used to build the (2001). Computer Vision. Alexandria, VA: Prentice Hall
system. [5] Altera Corporation, SOPC Builder datasheet, September 2002.
[6] M. Young, The Technical Writer’s Handbook. Mill Valley, CA:
University Science, 1989.

Potrebbero piacerti anche