Shane Tuohy Thesis

i
Real Time Distance Determination for an Automobile Environment using Inverse Perspective Mapping in Open CV
Shane Tuohy B.E in Electronic and Computer Engineering Supervisor Dr. Martin Glavin Co Supervisor Dr. Fearghal Morgan
24 March 2010
ii
Abstract
This project aims to develop a real time distance determination algorithm for use in an automobile environment. Increasingly modern cars are being fitted with image sensors, these image sensors can be used to obtain large amounts of information about the surrounding area. From a single front facing image, it is difficult to determine distances to objects in front of the vehicle with any degree of certainty. There is a non linear relationship between the height of an object in a front facing image, and its distance from the camera. This project aims to use Inverse Perspective Mapping to overcome this problem. Using Inverse Perspective Mapping, we can transform the front facing image to a top down birds eye view, in which there is a linear relationship between distances in the image and in the real world. The aim of the project is to implement the algorithm in the C language using the OpenCV libraries. Working in this way provides for a high performance, low overhead system that will be possible to implement and run on a low power embedded device in an automobile environment.
iii
Acknowledgements
I would like to acknowledge the help and support I received throughout the project from my project supervisor Dr. Martin Glavin and postgraduate researchers in the CAR lab, NUIG. In particular I would like to thank Diarmaid O'Cualain for his constant support and patience. This project would not have been possible without debugging help, discussion and encouragement received from my fellow 4th EE/ECE classmates. Finally, Id like to thank my parents for their continued support over the last 4 years.
iv
Declaration of Originality
I declare that this thesis is my original work except where stated. Date: ___________________________________ Signature: ___________________________________
1 Contents
Abstract ..................................................................................................................................................... ii Acknowledgements.................................................................................................................................. iii Declaration of Originality ............................................................................................................................. iv Table of Figures ........................................................................................................................................... vii 1 2 3 Glossary ................................................................................................................................................. 1 System Overview................................................................................................................................... 2 Background Technologies ..................................................................................................................... 6 3.1 3.2 Computer Vision ........................................................................................................................... 6 OpenCV ......................................................................................................................................... 6 Useful OpenCV Functions ..................................................................................................... 7 cvSetImageROI ...................................................................................................................... 8 cvWarpPerspective ............................................................................................................... 8 Drawing Functions ................................................................................................................ 8
3.2.1 3.2.2 3.2.3 3.2.4 3.3 4
Inverse Perspective Mapping ........................................................................................................ 9
Project Structure ................................................................................................................................. 12 4.1 4.2 4.3 Overall System Flowchart ........................................................................................................... 13 First Frame Operations Flowchart .............................................................................................. 14 Process Specifics ......................................................................................................................... 15 Capture Video Frame .......................................................................................................... 15 First Frame Operations ....................................................................................................... 15 Threshold Image ................................................................................................................. 16 Warp Perspective ................................................................................................................ 18 Distance Determination ...................................................................................................... 20 Provide Graphical Overlay................................................................................................... 21 Processing Real Time Images (from camera) ...................................................................... 22
4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 5
Optimisation and Testing .................................................................................................................... 23 5.1 5.2 5.3 5.4 5.5 Generate lookup array ................................................................................................................ 23 Sampling Rate ............................................................................................................................. 24 Finding Threshold Range............................................................................................................. 25 Using Performance Primitives..................................................................................................... 27 Level to Trigger Detection ........................................................................................................... 28
vi 5.6 5.7 6 Memory Management ................................................................................................................ 29 Calibration ................................................................................................................................... 30
Results ................................................................................................................................................. 32 6.1 6.2 Selection of Sampling Rate ......................................................................................................... 32 Performance of Algorithms ......................................................................................................... 34
Further Work....................................................................................................................................... 37 7.1.1 7.1.2 7.1.3 7.1.4 Processing Time .................................................................................................................. 37 Environmental Conditions................................................................................................... 37 Tracking ............................................................................................................................... 38 Embedded Implementation ................................................................................................ 38
8 9 10
Conclusion ........................................................................................................................................... 39 References .......................................................................................................................................... 40 Appendix A - On the CD .................................................................................................................. 41
vii
Table of Figures
Figure 1 - Overview of proposed system ...................................................................................................... 4 Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance ......................................................................................................................................................... 5 Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds linearly to real distance .................................................................................................... 5 Figure 4 Illustration of camera position and coordinate systems in use ................................................... 9 Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of image............................................................................................................................... 11 Figure 6 - Overall System Flowchart ........................................................................................................... 13 Figure 7 - First Frame Operations Flowchart .............................................................................................. 14 Figure 8 - Original image before thresholding to remove road pixels ........................................................ 17 Figure 9 - Thresholded image of same scene as figure above with road pixels removed .......................... 17 Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted18 Figure 11 Sample road scene image before perspective is warped using transformation matrix .......... 19 Figure 12 Previous figure after perspective has been warped using transformation matrix .................. 19 Figure 13 Transformed image of sample road scene, ready for object detection. ................................. 20 Figure 14 Source image with overlay of rectangular box and distance value ......................................... 22 Figure 15 - Illustrates mapping of points in source top down image to points in front facing image........ 24 Figure 16 - Original Image - No Thresholding Applied ................................................................................ 26 Figure 17 -Small threshold value applied to scene Large threshold value applied to scene .................. 26 Figure 18 - Thresholding with range of 35 ................................................................................................ 27 Figure 19 - Example of distance detection performed with small trigger value ........................................ 28 Figure 20 - Graph illustrating memory use for several video samples ....................................................... 30 Figure 21 Example of known square method of calibration .................................................................. 31 Figure 22 - Samples of successful distance determination ......................................................................... 32 Figure 23 - Plot of computation times for different sampling rates ........................................................... 33 Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10..................... 34 Figure 25 - Comparison of computation times in seconds ......................................................................... 35 Figure 26 - Graph of processing time for each of 3 major constituents of algorithm ................................ 36
1 Glossary
IPM Inverse Perspective Mapping OpenCV Open Computer Vision Thresholding Process by which pixels above or below a certain intensity are removed from an image C Low level, compiled programming language gcc Open source C compiler ROI Region of Interest
2 System Overview
In 2005, 396 people, more than one per day, were killed in road traffic accidents [1]. For this reason, collision avoidance or prevention systems are in the best interests of car safety. Safety is a primary concern for all car manufacturers. In recent years, ABS, stability controls, airbags, ESP etc. have become standard on many car models. Using computer vision techniques and optical cameras, safety systems can be vastly improved. Cars in the near future will be able to intelligently analyze their environment and react accordingly to improve driver safety. Computer vision is fundamentally the process by which we can allow machines to see the world and react to it. Its importance cannot be overstated in fields such as manufacturing, surveillance and environment detection. Using the techniques of computer vision, we can create powerful and helpful real world applications which incorporate real world conditions. An increasingly common application of computer vision systems is in the field of safety. Machines can be programmed to detect and respond to dangerous conditions automatically, based on the interpretation of the world around them. Computer vision can be used to provide accurate, useful information to machine operators or users. One such machine, where computer vision can be leveraged to provide useful, potentially lifesaving information, is the car. Current systems on the market from manufacturers such as Mercedes [2] pre-charge brakes and tighten slack on seatbelts if an imminent collision is detected. It is becoming increasingly common for modern automobiles to be fitted with camera systems to aid in driver awareness and safety. Systems such as those found in the Opal Insignia are becoming more and more popular. The Opel Insignia uses a front mounted camera to detect road signs and monitor lane departures providing increased levels of information to drivers. Distance determination in an automobile environment is understandably a worthwhile undertaking. With an effective distance determination algorithm, steps can be taken to alert drivers to potential hazards and unsafe driving. Distance data from a system similar to the one proposed could be applied to an adaptive cruise control system, which senses upcoming obstacles and adjusts the speed of the vehicle accordingly. In fact, combined with lane detection algorithms, it is entirely possible to envision a car that could, in theory, drive itself.
3 Currently available systems on the market from manufacturers such as Audi, Mercedes Benz, and Nissan use RADAR or LIDAR sensors to implement collision detection. These can work well when the RADAR signals reflect from a metal object, they do not, however, detect pedestrians or animals on the road. These systems are also expensive to implement, and are therefore a sub optimal solution. Current research into the area of collision detection is in the field of forward facing cameras, which provide more information about a scene and are cheap and reliable. The proposed system consists of a single front facing video camera mounted on a vehicle capturing video images at 30 frames per second. This setup distinguishes the system from similar systems which use either a multi camera setup or, alternatively, active devices such as RADAR or LADAR. A single camera system is more reliable and simpler than any of these methods. A dual (or more) camera setup as employed by Toyota in some Lexus models, provides more data to process, and therefore, more accurate results. However, it also carries severe processing and configuration overheads, which render it unsuitable for use in low power, low resource, embedded devices typically found in automobiles. It is also a much more expensive system to implement, for obvious reasons, than a single camera system. Active systems such as RADAR or LIDAR require signals to be reflected from targets, this leaves them susceptible to interference, possibly from other identical systems approaching them. Mounting these active systems to cars can difficult, and most importantly, carry a significant financial expense. Often, a front facing optical camera fitted to a car can have several uses, the same camera can be used for lane detection and road sign detection as well as distance determination, providing a comprehensive security package using a single camera. Previously, computer vision algorithms were much too computationally heavy to implement in real time on low power devices. Devices such as the Intel Atom processor, which are very capable, yet consume very little power, can make implementation of these types of algorithm in real time a reality. In conclusion, the passive nature, flexibility and simplicity of a single camera set up makes it well suited to implementation in an automobile environment. The proposed system is capable of providing life saving information to drivers.
4
Figure 1 - Overview of proposed system
The two figures below illustrate the problem explicitly. As the vehicle in front approaches the camera, vertical distance in the image, as indicated by the white lines, does not vary in a linear fashion. The third image has been transformed to a top down view, now distance on the image is linearly related to distance in the real world.
Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance
Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds linearly to real distance
3 Background Technologies
This chapter illustrates the technology background of the project. It provides a brief overview of the field of computer vision and the technologies being used to implement the system.
3.1 Computer Vision

Computer vision is a rapidly growing field. Modern, more powerful, microprocessors are increasingly able to handle the computationally expensive overhead of working with images. This opens up exciting possibilities for innovative computer vision implementations in everyday live. A recently popular example is the field of augmented reality, which uses computer vision techniques on a portable device to understand a scene and overlay useful information. It is often found in mobile phones, which prior to now, would not have been capable of handling the processing overhead of such a system. One area in which computer visions are an integral part is the DARPA Urban Challenge1 in which students attempt to create a vehicle that will drive itself. Vehicles are required to merge into two way traffic and carry out other complex driving manoeuvres autonomously. Without advanced computer vision algorithms, these challenges would be impossible.
3.2 OpenCV2
OpenCV stands for Open Computer Vision. OpenCV is a library of functions for image processing. Originally developed by Intel in 1999, it is now an Open Source project released under the BSD license. The purpose of OpenCV is to provide open and optimized code to support the advancement of The functions themselves are mostly written in C. The purpose of OpenCV is to provide robust, quick routines for standard image processing techniques. There are many commonly used techniques in image processing for computer vision applications. OpenCV provides implementations of many of these functions allowing for rapid algorithm implementation. While implementing algorithms, the programmer doesnt have to continually reinvent the wheel. Although OpenCV itself is an open source project, Intel provides a product named Integrated Performance Primitives, a commercial package of highly optimized routines which OpenCV can use in place of its own routines to speed up computation.
1 2
http://www.darpa.mil/grandchallenge/index.asp http://opencv.willowgarage.com/wiki/
7 For this particular project, OpenCV greatly accelerated development by providing routines to threshold images, generate homography matrices and sample regions of images. Although wrappers for the OpenCV libraries have been developed for high level languages such as C# and Python, code for this system was written solely in C, and compiled using the standard gcc compiler. Using a lower level language like C increases performance, leading to a real time, or close to real time implementation. 3.2.1 Useful OpenCV Functions
3.2.1.1 cvThreshold Performing a threshold of an image is a fundamental image processing technique. It involves examining the intensity values of each pixel in an image, and performing a particular operation depending on this value. OpenCV provides extensive thresholding options via the cvThreshold function. Several different types of thresholding are available; CV_THRESH_BINARY CV_THRESH_BINARY_INV CV_THRESH_TRUNC CV_THRESH_TOZERO_INV CV_THRESH_TOZERO
Most pertinent to this project are CV_THRESH_TOZERO, CV_THRESH_TOZERO_INV and CV_THRESH_BINARY. CV_THRESH_TOZERO If a pixel is below the threshold value, it is given an intensity value of 0. Otherwise it is not affected. CV_THRESH_TOZERO_INV The opposite of above, if a pixel is above the threshold value, it is given an intensity of 0. Otherwise it is not affected. CV_THRESH_BINARY As the name suggests, depending on which side of the threshold value the pixel intensity value lies on, it is assigned a value of 0, or 255.
8 3.2.2 cvSetImageROI
cvSetImageROI allows the programmer to perform operations on specified areas of an image. This allows functions such as thresholding, averaging, smoothing etc to be applied to one part of an image, while leaving the rest of the image the same, this function has proven very useful throughout the project. 3.2.3 cvWarpPerspective
cvWarpPerspective, as the name suggest, allows the perspective of an image be warped based on a transformation matrix. This has been the most important function provided by OpenCV for this project. It allows points to be mapped from one perspective to another through the use of a single function, greatly reducing programmer overhead. 3.2.4 Drawing Functions
OpenCV provides numerous drawing functions which allow feedback to be given to the user easily by overlaying shapes and text onto images.
3.3 Inverse Perspective Mapping

A front facing image from a car is useful for many applications, it is however, useless for distance determination. Vehicles that are far away from the camera appear high in the image. They appear progressively lower as they approach the camera. The problem is that this does not occur in a linear fashion, therefore there is no simple way to discern distance information based on the position of a vehicle in a front facing image. To overcome this issue, we use Inverse Perspective Mapping [3][4]. Inverse Perspective Mapping (IPM) uses a 3 by 3 homography matrix to translate points from the image plane to those found in the real world. Using this homography matrix, we can transform our image so that we are looking directly at the true road plane. We can use this to measure the distance between objects in the image with a degree of certainty, since the relative position of the objects will change in a linear fashion. The figure below illustrates the different coordinate systems that are employed, and their relation to one another.
Figure 4 Illustration of camera position and coordinate systems in use
10 As can be seen from the figure above, mapping a point from I(u,v) involves a rotation about the angle and a translation along the line of sight of the camera. The matrix mathematics are shown below in Equation 1 [5]
Equation 1 Matrix Mathematics for transforming point from image coordinates to world coordinates
Simplifying results in Equation 2:

Equation 2 - Illustrating transformation matrix Transformation Matrix
We use homogenous co ordinate systems as it allows us to represent points at infinity, e.g. vanishing points. They also allow us to include division into linear equations, as well as constants.[6]
Equation 3 - Demonstration of benefits of homogenous coordinates
Using the Transformation Matrix above, we can map any point in the plane I(u,v) to its corresponding point in the real world plane W(X, Y, Z) with Z = 0. The transformation results in images in which any point with a Z coordinate greater than zero is incorrectly mapped, resulting in distortion of the image, as can be seen in the image below. For our purposes, an image similar to the one in the figure below is perfectly sufficient to determine distances.
11
Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of scene. image.
12
4 Project Structure
This chapter describes the structure of work carried out during the course of the project and the development of the system. It describes an overall flow for the system before explaining in detail the logic and implementation behind each step in the system. This project was split into 5 distinct goals to measure progress. Those goals were;
Goal 1 Commission the OpenCV system to load frames of video into memory. Sample the image pixels of the road directly in front of the vehicle. Threshold based on these sample pixels to remove the road surface from the image. The only remaining pixels are that of the sky, edges of the road (such as trees and buildings), road markings and other vehicles. Generate the IPM characteristic matrix which is developed from the height of the camera from the ground, and the angle at which the camera is mounted (w.r.t. the ground plane). Goal 2 Transform the image from the original input view to the IPM transformed view. This is done by applying the IPM matrix developed in the previous milestone. Goal 3 Determine the distance to the vehicle in front, by looking for the first pixels of non-road directly in front of the vehicle. One the position is determined; it is then possible to calculate the distance. Goal 4 Display this information to the driver by overlaying graphics on the original image to clearly indicate the distance to the vehicle in front.
13 Goal 5 Modify the system to run in real time with a video stream. If possible, this could be achieved in real-time using a video camera and a laptop (or other available embedded signal processing hardware).
4.1 Overall System Flowchart

The following flowchart illustrates general overall system operation.
Figure 6 - Overall System Flowchart
14
4.2 First Frame Operations Flowchart

The following flowchart illustrates operations carried out on receipt of first frame of video.
Figure 7 - First Frame Operations Flowchart
15
4.3 Process Specifics

4.3.1 Capture Video Frame
Capturing of video frames is done using OpenCVs file capture or camera capture functions. Using these functions we create a CvCapture object from which we can query for frames from either the video, or camera connected to the laptop. 4.3.2 First Frame Operations
This section of the system consists of several operations, which need only be carried out once. The data generated through these operations can then be used over and over for each frame of video that is sampled. Many of the methods used in this project are computationally heavy to do on the fly, so as much work as possible is done at the beginning of the program. This way the stored values can be used repeatedly for each frame, saving on computation time. 4.3.2.1 Capture source points Firstly, 4 source points are captured from the user. These points are stored in an array, and used for the next step, generation of the transformation matrix. This is done using a mouse handler to return the position of user selected points in the image. 4.3.2.2 Generate transformation matrix The transformation matrix is the key to the Inverse Perspective Mapping algorithm. Helpfully, OpenCV provides a function to generate this matrix without needing to manually carry out the mathematical operations listed in the Inverse Perspective Mapping section above. The function cvGeneratePerspectiveTransform takes an array of points from the source image, and generates the transformation matrix that maps them to an array of points in the destination (top down) image. The simplified matrix mathematics is illustrated in the equation below.
The source points that are chosen by the user by clicking points in the image, map to a square in the destination image. If we lay a square shaped object on the road in front of the camera, we can use the corners of the square to generate the appropriate transformation matrix for the current environmental conditions.
16 Importantly, applying this operation in the other direction gives us the inverse transformation matrix. This allows us to map points back from the destination image back to the source image. 4.3.3 Threshold Image
In order to detect a vehicle on the road in front of us, we need to be able to discern what part of an image is the road, and what part is a vehicle. A solution to this problem is to threshold the image to remove road pixels. That way, anything in front of the car with an RGB intensity value greater than zero is an object in front of the car. The major difficulty that this presents is, given a certain image, how can one detect what is a road pixel and what is not? Roads vary in shade depending on the time of day, weather etc. To obtain a value for the particular scene we are working on, we can sample the value of pixels directly in front of the vehicle. We take a small patch of road slightly in front of the car and obtain an average value for the pixels across that patch. This gives a good value for the RGB characteristics of the road surface. This process is somewhat rough, and could be improved by instead taking the median value for all pixels, thus eliminating noise values generated by the system. Adaptive thresholding, that is, using different thresholding values for different areas of the image, could be implemented to improve the process further. We threshold based on these values. Thresholding works as follows; Split image into its constituent channels (R, G, B) Use built in OpenCV threshold functions CV_THRESH_TOZERO to remove pixels above and below the threshold value This leaves us with images containing just the road surface in the R, G and B planes We subtract the original R, G and B images from the images containing just the road surface and merge the result Finally we perform a binary threshold to leave any non road values with a high value to aid in distance determination An example of the thresholding algorithm at work can be seen in the following three figures. Fig. 8 shows a sample source image prior to application of the algorithm. Fig. 9 shows the result of
17 removal of road pixels from the image. Finally, Fig. 10 shows how non road objects are highlighted to allow for easier detection.
Figure 8 - Original image before thresholding to remove road pixels
Figure 9 - Thresholded image of same scene as figure above with road pixels removed
18
Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted
As can be seen from the example above, the thresholding algorithm is very effective. Most of the road surface has been removed, leaving the object in front clearly detectable due to its bright colouring. 4.3.4 Warp Perspective
As part of the first frame operations, we generated the transformation matrix based on 4 source points in the image mapped to 4 destination points in a transformed image. Now we need to transform the video frame based on this transformation matrix. OpenCV provides a function to transform an image based on a 3x3 homography matrix. This is the matrix we generated earlier. Application of this function to an image results in a transformation similar to the one shown in the figures below. Warping the perspective of an image involves considerable computational overhead. It is therefore pertinent to use the operation as sparingly as possible.
19
Figure 11 Sample road scene image before perspective is warped using transformation matrix
Figure 12 Previous figure after perspective has been warped using transformation matrix
Note Image samples above are not thresholded as they would be during real operation of the system. They are given to illustrate clearly the effects of the perspective transform.
20 4.3.5 Distance Determination
Now that we have warped the image to a top down view, we can measure the distance to the vehicle in front linearly. This consists of several steps.
Figure 13 Transformed image of sample road scene, ready for object detection.
We know the coordinates of the front of our car and so, we loop vertically upwards through the image, working on a small rectangle in the area in front of the car. For each small rectangle, we average the pixel values across the rectangle. We know that all road pixels are zero, thanks to the threshold applied earlier, so we increment the position of the rectangle until the average function returns a non-zero
21 value. This value is stored in a global variable that is accessible from the main body of code and corresponds to the distance between our vehicle and the object directly in front. Now we have the position of the object directly in front of the car in its transformed image coordinates. We transform this point back to the original coordinate system in order to overlay feedback to the user. 4.3.6 Provide Graphical Overlay
Now that we have a value for the position of the object in co ordinates relevant to the original image, we can overlay graphics and text for display to the user. The value that we have calculated for distance is in pixels. It is the distance in pixels from the base of the image (front of our vehicle) to the next object in front of our vehicle. This value will change in a linear fashion as the real world distance changes, e.g. a value of 700 pixels will equate to twice the distance 350 pixels equates to. It is this linearity that is the strength of the Inverse Perspective Mapping algorithm. In order to display an accurate value for distance, we need to know the scaling factor. This is the value in pixels that corresponds to one meter. This value can calibrated for a particular camera configuration by placing a meter stick in front of the car flat on the road surface and measuring the number of pixels that this corresponds to in the top down view. This value will stay the same as long as the characteristics of the camera (its height from the ground, focal length etc) remain the same. For the purposes of testing using videos from several different sources, an approximate scaling factor was chosen based on analysis of several sample top down images. The value was calibrated based on the fact that the standard for Irish lane markings is approximately 1.5m. Graphical information provided to the user is in the form of the distance figure being displayed in the upper left quadrant of the image, typically where very little activity takes place. A rectangular box is also drawn around the detected area to verify that the correct object has been detected. The figure below shows a sample overlay of information to the user.
22
Figure 14 Source image with overlay of rectangular box and distance value
4.3.7
Processing Real Time Images (from camera)
While most design, testing and processing was carried out on pre recorded video, a very important goal had also to be realised. For the algorithm to function in a real time system, it had to be made possible to not only work on pre recorded video, put live real time data from a camera also. This involved implementation of a mechanism by which the program would read frames from a camera rather than a file. If the system is invoked with one command line argument, it attempts to load the video at the path specified by the argument. If the system is started without a command line argument, the program will attempt to query frames from a camera attached to the system.
23
5 Optimisation and Testing

Computer vision is inherently computationally heavy. Algorithms which employ many computer vision techniques can take prohibitively long to compute for real time video. This section explores some ways in which the system was optimised for maximum accuracy with minimum computation overhead.
5.1 Generate lookup array

One way of optimising an algorithm is to generate a lookup table. In order to translate a single point from the top down, IPM view, back to the source image we must perform a perspective transform on this point. If we are running the algorithm on a real time system, this overhead may be unacceptable for real time operation. As was explored in the section on Inverse Perspective Mapping, in order to transform an image from the IPM coordinate system to the image coordinate system, we must perform non trivial matrix multiplication. The solution is to use the inverse transformation matrix to map all vertical points back to their equivalent point in the source image in one single operation, and store the values in an array. This way we can map a point to its equivalent vertical co-ordinate by simply referencing the lookup array at that point. So, when we need to map a point to the original image from the transformed view, we simply need to reference a value in a pre computed array, instead of performing the actual transformation. The figure below illustrates how points are not linearly mapped back to the original image. Using the look up table, we can map any points vertical coordinate in the top down image, to its equivalent in the front facing image. This allows us to map the point where the object is detected back to the original image with ease.
24
Figure 15 - Illustrates mapping of points in source top down image to points in front facing image
5.2 Sampling Rate

Image processing operations are computationally heavy operations. A video file contains, in general, 30 frames per second. Performing all operations on each frame equates to 30 sets of operations per second. This is much too frequent for our needs. So, instead we perform the operations once every x frames. This project is targeted to function in an automobile environment using a front mounted camera. Most of the time, the rate of change of distance between our object and the object in front of us is relatively small, usually cars on the same road are travelling at roughly similar speeds. Sampling and computing distance information every 33 milliseconds provides very little extra information. Over 33ms the difference in distance between our vehicle and the vehicle in front of us will be negligible. For a car travelling 5km/hr faster than another car, the rate of change of distance between the two cars is 1.38m/s. In 33ms, the difference in distance is 0.046m. This value is small enough as to
25 be imperceptible to a user. Therefore we can alter the sampling rate to calculate distance less frequently, and therefore save on resources with little to no visible difference during operation.
5.3 Finding Threshold Range

In order to remove the road surface from the input image, we must know something about the characteristics of the road we are travelling on. This involves sampling an area of the road to determine appropriate RGB values for the colour of the road surface. This is done by sampling a small box in front of the car and extracting average colour data from this area. A larger sample area gives us a more reliable and accurate reading of the road surface, but there is a trade off in computation time and levels of noise in the image. The larger the sampling box used, the more likely that road markings or other objects will be included in the averaging process. This results in a less accurate thresholding value, which when used in thresholding, decreases reliability and performance. Choosing an appropriate level above and below the threshold value is very important. Enough allowance must be given to allow all road pixels to be removed, while not removing much of other objects. If the value is too small, not all of the road surface will be removed, leading to detection of the road as an object. If the value is too large, too much of the destination object will be removed, leading to difficulty in distance determination. An example of incorrect values in use can be seen below.
26
Figure 16 - Original Image - No Thresholding Applied
Figure 17 -Small threshold value applied to scene
Large threshold value applied to scene
As can be seen from the image above on the left, in which a range of values 1 the threshold values were applied, we need a relatively large range to accurately remove the entire road surface. The image on the right applied a threshold range of 100, which is clearly too large as we have removed much of the vehicle in front along with the road.
27 A range of 35 gives satisfactory results, which can be seen below.
Figure 18 - Thresholding with range of 35
In the figure above some of the detail in the bottom part of the vehicle in front of ours has been removed along with the road. This is not a problem as we will now apply a binary threshold to highlight any values that are above zero.
5.4 Using Performance Primitives

Performance is a major concern in this project. OpenCV provided somewhat optimized routines for image processing, but there are available more optimized implementations of some core functions. These are available in a commercial package called Intel Performance Primitives [7]. When deploying on an Intel processor, OpenCV is able to take advantage of these performance routines to greatly accelerate the execution of code. Since an intended target platform for this system is the Intel Atom [8] processor, use of these performance primitives could greatly accelerate execution of the algorithm. This package is a commercial product, and for cost reasons its use was not explored during the course of the project.
28
5.5 Level to Trigger Detection

While carrying out distance detection, we loop horizontally through the frame, averaging a small rectangle directly in front of the vehicle. The frame that we are scanning has been thresholded, but that does not definitely mean that all road pixels have successfully been removed. We cannot simply check for the first non zero pixel value in the image. There may be noise or artifacts left in the image. Testing needed to be done to select a value to trigger object detection, which would only be reached by an actual object, and filter out any noise. The figure below shows the effect of noise on distance detection.
Figure 19 - Example of distance detection performed with small trigger value
Using a very low value, as shown above, results in very small levels of noise triggering object detection. Conversely, using a very high value leads to no object detection at all. An appropriate value was found through measuring average values across the detection rectangle and discerning a threshold value from these measurements.
29
5.6 Memory Management

Given that the majority of the project was completed using the C programming language, which does not include automatic memory management, it was of paramount importance to de allocate any memory that was allocated. Memory is not in ample supply in embedded systems and therefore must be strictly monitored. During early testing of the finished algorithm, memory use became a big concern. While processing a 30 second video, system memory use peaked at over 1GB. Clearly this was unacceptable. There are several ways to ensure that memory used by an OpenCV program is kept in check. Primary among these is, when allocating memory structures, one must be vigilant in de allocating it when finished. The following table illustrates common memory allocation methods and their equivalent de allocation methods.
cvCreateImage cvCreateImageHeader cvCreateMat cvCreateMatND cvCreateData cvCreateSparseMat cvCreateMemStorage cvCreateGraphScanner cvOpenFileStorage cvAlloc
cvReleaseImage cvReleaseImageHeader cvReleaseMat cvReleaseMatND cvReleaseData cvReleaseSparseMat cvReleaseMemStorage cvReleaseGraphScanner cvReleaseFileStorage cvFree
Once these rules were observed strictly, memory usage declined drastically. Below is a table illustrating measurements of memory use for several sample videos, after memory use was cut.
30
Memory Usage
16 14 12 10 Memory (MB) 8 6 4 2 0 1 2 3 4 5
Figure 20 - Graph illustrating memory use for several video samples
5.7 Calibration
In order to obtain accurate distances from the system, calibration for a particular environment needs to be carried out. There are several ways in which this can be done. When the system is installed in a vehicle, the position and angle of the camera will not change, which means that instead of the approximate method employed for testing in the current implementation, we can employ a more accurate method of calibration, which will give more accurate distance values. One such method of calibration is to calibrate the camera by placing a square object of known size in front of the camera on the road plane. Using simple mouse clicks, the transformation matrix for that environment can be obtained. Due to the wide variety of samples from different environments being tested as part of this project, this is the method that has been implemented. This method provides a somewhat rough value, but is satisfactory for testing purposes. The figure below shows a scene in which a rectangular shape has been overlain, by clicking the corners of this rectangle, the transformation matrix for this environment can be found.
31
Figure 21 Example of known square method of calibration
A second method that can be employed to calibrate the camera and generate the transformation matrix is through the use of the cameras intrinsic and extrinsic characteristics. A cameras focal length, height above the ground, and viewing angle can be used. Exploration of this method of calibration is outside the scope of this project. Finally, it is possible to perform automatic calibration of the camera using a checkerboard patterned square placed in front of the car. This is the preferred option, providing simple and reliable calibration. Firstly, the system detects the checkerboard pattern of known size [11]. From this information the transformation matrix can be generated. This technique has not been implemented as part of this system.
32
6 Results
This section illustrates results obtained by the system as implemented. Insight is given into situations in which the algorithm is effective and where it can be improved. Naturally the easiest way to evaluate the effectiveness of a system is to test it on a variety of different conditions. It was found that the algorithm functioned as expected for most all sample videos. Below are screenshots of several videos in which the algorithm functioned as expected.
Figure 22 - Samples of successful distance determination
Areas in which the algorithm did not perform as expected will be explored in the Further Work chapter.
6.1 Selection of Sampling Rate

Computer vision algorithms are, as a general rule, very resource intensive. Since the goal for this algorithm is to implement it in real time on a video stream, performance in a very important concern.
33 A standard camera captures frames at a rate of 30 frames per second. Each frame is displayed for 33 milliseconds. It is not feasible to run the entire algorithm on each frame, 30 times per second. To improve performance and lessen the load on the system, we run the algorithm less than 30 times per second, every x frames. Below is a chart of computation times for the algorithm on a 386 frame video stream, at different sampling rates.
Total Computation Time

25 20 15 Time (s) 10 5 0 0 2 4 6 8 Sampling Rate 10 12 14 16
Figure 23 - Plot of computation times for different sampling rates
A sampling rate of 10 frames was chosen as a good tradeoff between accuracy and updating frequency.
34
Average Total Computation Time

25 20 15 Time 10 5 0 1 Sampling Rate 10
Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10
6.2 Performance of Algorithms

Analysis was carried out on time taken to carry out each step of the system as described in the overall flowchart of the system, the results of this analysis can be seen below. Firstly measurements were taken of the total time to perform all 3 of the major operations that must be repeatedly carried out, namely; Thresholding image to remove road surface, Warping perspective to create top down view, Distance determination.
The chart below shows a comparison on the time taken by each step to process all frames of a 386 frame video.
35
Processing Time in Seconds

1.8 1.6 1.4 1.2 1 Time (s) 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 Distance Determination Thresholding Warp Perspective
Figure 25 - Comparison of computation times in seconds
Carrying out all operations on each frame was found to take a total of 0.053 seconds or 53 milliseconds. This figure is not inherently very useful to know, as it is relative to the processor on which the test is being carried out. It is greatly affected by the presence of other processes running on a system. Testing the algorithm on a dedicated microprocessor would give more quantitative bench marks. What can be inferred from the figures obtained are figures for the percentage of total time taken up by each part of the system. To generate these values, the algorithm was modified to run only one of the 3 major operations listed above. Timing was then carried out using the built in Linux command time [9], which measures real time taken as well as user and kernel time taken to execute programs. The results were corroborated with a second timing method, using built in clock functionality in the C language [10]. Below is an illustration of the percentage of total execution time taken up by each of the 3 major parts of the algorithm.
36
Processing Time for Each Section as %

70% 60% 50% 40% 30% 20% 10% 0% Warp Perspective Thresholding Distance Determination
Figure 26 - Graph of processing time for each of 3 major constituents of algorithm
As can be seen from the figure above, as predicted, warping perspective of the image in order to generate the top down view of the scene is by far the most time consuming part of the algorithm. The second most computationally expensive operation is thresholding of the image, again as expected. Thresholding an image works on the whole image, altering each pixel based on a rule. In this system, this is done several times, resulting in significant processing time.
37
7 Further Work
While the system is very successful in determining distances and detecting objects in front of a vehicle, it stands to be improved in several areas. 7.1.1 Processing Time
Currently each processed frame requires on average, 0.05 seconds of processor time. This figure can be improved upon in a number of ways; Reduce number of channels in image to be transformed, to one. This will have the effect of reducing the computation required to transform the (3 channel) image being transformed to 1, and should provide a drastic increase in performance. When thresholding image, only threshold the portion required by the algorithm. Currently thresholding is applied to the whole image, this is not required as some parts of the image, e.g., the horizon and the bonnet of the car are irrelevant and unimportant. Cropping these areas out will increase efficiency. Change thresholding algorithm to use less memory. In the system, as part of the thresholding operation, several extra data structures are allocated and de allocated. This slows down computation and increases the amount of memory used. A more efficient algorithm using fewer resources would improve overall processing time. Implement tracking algorithm The sample rate could be further reduced from 3 times per second with the help of a tracking algorithm. 7.1.2 Environmental Conditions
The algorithm in its current form is quite susceptible to changes in environment, e.g., going from bright areas to dim areas. This aspect of the system could be improved using adaptive thresholding.
38 Secondly, the system detects road markings in the middle of the road as objects, which interferes with distance detection. The system could be improved to intelligently filter out these markings and improve reliability of the algorithm 7.1.3 Tracking
Implementing tracking as part of the system would greatly improve the algorithm in several ways. By the nature of the environment where the system operates, there is little change in the location of the detected object from one frame to another. A tracking algorithm could assist in situations where the algorithm has lost the object or has been compromised by noise conditions on the road. 7.1.4 Embedded Implementation
It is very much hoped that the system will be ported to an embedded processor in the near future where it can be properly tested and benchmarked for use in an actual vehicle. Manufacturer specific high performance C libraries such as the Intel Performance Primitives could be employed to greatly increase performance.
39
8 Conclusion
As can be seen from the successful implementation of this algorithm in the C language, a real time distance determination system using OpenCV is clearly achievable. The system as it stands is functional and complete. Refinements are needed before the system can be deployed with confidence to an actual embedded device, but indications are positive that this will be possible. OpenCV has proven a powerful and lightweight computer vision framework and greatly assisted in the development of the project. A real time, single camera, passive distance determination algorithm as implemented here could have a positive effect on road safety and avoidance of road collisions. The use of a single optical camera, which can have many purposes in a single installation, makes it an attractive proposition for car manufacturers due to its low cost and simple configuration. This system offers benefits over similar active systems in terms of both cost and functionality, in that its object detection is not solely limited to metal, reflective objects. For normal road conditions the algorithm was found to function very well, providing useful information to the user. This information could then be integrated into the vehicles operation in several ways; by alerting a user of imminent danger; alerting a user that they are not maintaining a safe following distance in relation to the car in front; and by performing pre crash safety procedures if an impending collision. All of these benefits combine to make a vehicle which implements this system a safer one which ought to lead to fewer road accidents and fewer injuries or fatalities.
40
9 References
1. Road Safety Authority Road Collision Facts 2005 (http://www.rsa.ie/publication/publication/upload/2005%20Road%20Collision%20Facts.pdf) 2. Mercedes Pre Safe (http://www2.mercedesbenz.co.uk/content/unitedkingdom/mpc/mpc_unitedkingdom_website/en/home_mpc/passengercars/ho me/new_cars/models/cls-class/c219/overview/safety.html) 3. Maud, Hussain, Samad et al. 2004. Implementation of Inverse Perspective Mapping Algorithm For The Development Of An Automatic Lane Tracking System 4. 5. 6. 7. 8. 9. Mallot et al. 1991. Inverse perspective mapping simplifies optical flow computation and obstacle detection D. O Cualain, C. H. 2009. Lane Departure Detection Using Subtractive Clustering in the Hough Domain. Paul Smith, NUIG Guest Lecture. Applications of Linear Algebra: Computer Vision in Sports Intel Performance Primitives (http://software.intel.com/en-us/intel-ipp/) Intel Atom processor (http://www.intel.com/technology/atom/) time command (http://linux.about.com/library/cmd/blcmdl1_time.htm)
10. Timing in C (http://beige.ucs.indiana.edu/B673/node104.html) 11. Learning OpenCV Computer Vision with the OpenCV libraries. Gary Bradski, Adrian Kaehler. 2008. O. Reilly Media.
41
10 Appendix A - On the CD
Included on the submitted CD is the entirety of the Subversion repository of code developed throughout the course of the project. The code is split into various folders with snippets to carry out different parts of the algorithm. The final implementation, which incorporates many of the separate parts can be found in the Final Implementation folder. Some sample images and videos are included for testing purposes.

Shane Tuohy Thesis

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Shane Tuohy Thesis

Caricato da

Copyright:

Formati disponibili

i

3.2.1 3.2.2 3.2.3 3.2.4 3.3 4

Inverse Perspective Mapping ........................................................................................................ 9

4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 5

vi 5.6 5.7 6 Memory Management ................................................................................................................ 29 Calibration ................................................................................................................................... 30

3.1 Computer Vision

3.3 Inverse Perspective Mapping

Figure 4 Illustration of camera position and coordinate systems in use

Simplifying results in Equation 2:

4.1 Overall System Flowchart

Figure 6 - Overall System Flowchart

4.2 First Frame Operations Flowchart

Figure 7 - First Frame Operations Flowchart

4.3 Process Specifics

Figure 8 - Original image before thresholding to remove road pixels

20 4.3.5 Distance Determination

Processing Real Time Images (from camera)

5 Optimisation and Testing

5.1 Generate lookup array

5.2 Sampling Rate

5.3 Finding Threshold Range

Figure 16 - Original Image - No Thresholding Applied

Figure 17 -Small threshold value applied to scene

Large threshold value applied to scene

27 A range of 35 gives satisfactory results, which can be seen below.

Figure 18 - Thresholding with range of 35

5.4 Using Performance Primitives

5.5 Level to Trigger Detection

Figure 19 - Example of distance detection performed with small trigger value

5.6 Memory Management

Figure 20 - Graph illustrating memory use for several video samples

Figure 21 Example of known square method of calibration

Figure 22 - Samples of successful distance determination

6.1 Selection of Sampling Rate

Total Computation Time

Figure 23 - Plot of computation times for different sampling rates

Average Total Computation Time

Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10

6.2 Performance of Algorithms

Processing Time in Seconds

Figure 25 - Comparison of computation times in seconds

Processing Time for Each Section as %

Figure 26 - Graph of processing time for each of 3 major constituents of algorithm

Potrebbero piacerti anche