Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Real Time Distance Determination for an Automobile Environment using Inverse Perspective Mapping in Open CV
Shane Tuohy B.E in Electronic and Computer Engineering Supervisor Dr. Martin Glavin Co Supervisor Dr. Fearghal Morgan
24 March 2010
ii
Abstract
This project aims to develop a real time distance determination algorithm for use in an automobile environment. Increasingly modern cars are being fitted with image sensors, these image sensors can be used to obtain large amounts of information about the surrounding area. From a single front facing image, it is difficult to determine distances to objects in front of the vehicle with any degree of certainty. There is a non linear relationship between the height of an object in a front facing image, and its distance from the camera. This project aims to use Inverse Perspective Mapping to overcome this problem. Using Inverse Perspective Mapping, we can transform the front facing image to a top down birds eye view, in which there is a linear relationship between distances in the image and in the real world. The aim of the project is to implement the algorithm in the C language using the OpenCV libraries. Working in this way provides for a high performance, low overhead system that will be possible to implement and run on a low power embedded device in an automobile environment.
iii
Acknowledgements
I would like to acknowledge the help and support I received throughout the project from my project supervisor Dr. Martin Glavin and postgraduate researchers in the CAR lab, NUIG. In particular I would like to thank Diarmaid O'Cualain for his constant support and patience. This project would not have been possible without debugging help, discussion and encouragement received from my fellow 4th EE/ECE classmates. Finally, Id like to thank my parents for their continued support over the last 4 years.
iv
Declaration of Originality
I declare that this thesis is my original work except where stated. Date: ___________________________________ Signature: ___________________________________
1 Contents
Abstract ..................................................................................................................................................... ii Acknowledgements.................................................................................................................................. iii Declaration of Originality ............................................................................................................................. iv Table of Figures ........................................................................................................................................... vii 1 2 3 Glossary ................................................................................................................................................. 1 System Overview................................................................................................................................... 2 Background Technologies ..................................................................................................................... 6 3.1 3.2 Computer Vision ........................................................................................................................... 6 OpenCV ......................................................................................................................................... 6 Useful OpenCV Functions ..................................................................................................... 7 cvSetImageROI ...................................................................................................................... 8 cvWarpPerspective ............................................................................................................... 8 Drawing Functions ................................................................................................................ 8
Project Structure ................................................................................................................................. 12 4.1 4.2 4.3 Overall System Flowchart ........................................................................................................... 13 First Frame Operations Flowchart .............................................................................................. 14 Process Specifics ......................................................................................................................... 15 Capture Video Frame .......................................................................................................... 15 First Frame Operations ....................................................................................................... 15 Threshold Image ................................................................................................................. 16 Warp Perspective ................................................................................................................ 18 Distance Determination ...................................................................................................... 20 Provide Graphical Overlay................................................................................................... 21 Processing Real Time Images (from camera) ...................................................................... 22
Optimisation and Testing .................................................................................................................... 23 5.1 5.2 5.3 5.4 5.5 Generate lookup array ................................................................................................................ 23 Sampling Rate ............................................................................................................................. 24 Finding Threshold Range............................................................................................................. 25 Using Performance Primitives..................................................................................................... 27 Level to Trigger Detection ........................................................................................................... 28
Results ................................................................................................................................................. 32 6.1 6.2 Selection of Sampling Rate ......................................................................................................... 32 Performance of Algorithms ......................................................................................................... 34
Further Work....................................................................................................................................... 37 7.1.1 7.1.2 7.1.3 7.1.4 Processing Time .................................................................................................................. 37 Environmental Conditions................................................................................................... 37 Tracking ............................................................................................................................... 38 Embedded Implementation ................................................................................................ 38
8 9 10
vii
Table of Figures
Figure 1 - Overview of proposed system ...................................................................................................... 4 Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance ......................................................................................................................................................... 5 Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds linearly to real distance .................................................................................................... 5 Figure 4 Illustration of camera position and coordinate systems in use ................................................... 9 Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of image............................................................................................................................... 11 Figure 6 - Overall System Flowchart ........................................................................................................... 13 Figure 7 - First Frame Operations Flowchart .............................................................................................. 14 Figure 8 - Original image before thresholding to remove road pixels ........................................................ 17 Figure 9 - Thresholded image of same scene as figure above with road pixels removed .......................... 17 Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted18 Figure 11 Sample road scene image before perspective is warped using transformation matrix .......... 19 Figure 12 Previous figure after perspective has been warped using transformation matrix .................. 19 Figure 13 Transformed image of sample road scene, ready for object detection. ................................. 20 Figure 14 Source image with overlay of rectangular box and distance value ......................................... 22 Figure 15 - Illustrates mapping of points in source top down image to points in front facing image........ 24 Figure 16 - Original Image - No Thresholding Applied ................................................................................ 26 Figure 17 -Small threshold value applied to scene Large threshold value applied to scene .................. 26 Figure 18 - Thresholding with range of 35 ................................................................................................ 27 Figure 19 - Example of distance detection performed with small trigger value ........................................ 28 Figure 20 - Graph illustrating memory use for several video samples ....................................................... 30 Figure 21 Example of known square method of calibration .................................................................. 31 Figure 22 - Samples of successful distance determination ......................................................................... 32 Figure 23 - Plot of computation times for different sampling rates ........................................................... 33 Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10..................... 34 Figure 25 - Comparison of computation times in seconds ......................................................................... 35 Figure 26 - Graph of processing time for each of 3 major constituents of algorithm ................................ 36
1 Glossary
IPM Inverse Perspective Mapping OpenCV Open Computer Vision Thresholding Process by which pixels above or below a certain intensity are removed from an image C Low level, compiled programming language gcc Open source C compiler ROI Region of Interest
2 System Overview
In 2005, 396 people, more than one per day, were killed in road traffic accidents [1]. For this reason, collision avoidance or prevention systems are in the best interests of car safety. Safety is a primary concern for all car manufacturers. In recent years, ABS, stability controls, airbags, ESP etc. have become standard on many car models. Using computer vision techniques and optical cameras, safety systems can be vastly improved. Cars in the near future will be able to intelligently analyze their environment and react accordingly to improve driver safety. Computer vision is fundamentally the process by which we can allow machines to see the world and react to it. Its importance cannot be overstated in fields such as manufacturing, surveillance and environment detection. Using the techniques of computer vision, we can create powerful and helpful real world applications which incorporate real world conditions. An increasingly common application of computer vision systems is in the field of safety. Machines can be programmed to detect and respond to dangerous conditions automatically, based on the interpretation of the world around them. Computer vision can be used to provide accurate, useful information to machine operators or users. One such machine, where computer vision can be leveraged to provide useful, potentially lifesaving information, is the car. Current systems on the market from manufacturers such as Mercedes [2] pre-charge brakes and tighten slack on seatbelts if an imminent collision is detected. It is becoming increasingly common for modern automobiles to be fitted with camera systems to aid in driver awareness and safety. Systems such as those found in the Opal Insignia are becoming more and more popular. The Opel Insignia uses a front mounted camera to detect road signs and monitor lane departures providing increased levels of information to drivers. Distance determination in an automobile environment is understandably a worthwhile undertaking. With an effective distance determination algorithm, steps can be taken to alert drivers to potential hazards and unsafe driving. Distance data from a system similar to the one proposed could be applied to an adaptive cruise control system, which senses upcoming obstacles and adjusts the speed of the vehicle accordingly. In fact, combined with lane detection algorithms, it is entirely possible to envision a car that could, in theory, drive itself.
3 Currently available systems on the market from manufacturers such as Audi, Mercedes Benz, and Nissan use RADAR or LIDAR sensors to implement collision detection. These can work well when the RADAR signals reflect from a metal object, they do not, however, detect pedestrians or animals on the road. These systems are also expensive to implement, and are therefore a sub optimal solution. Current research into the area of collision detection is in the field of forward facing cameras, which provide more information about a scene and are cheap and reliable. The proposed system consists of a single front facing video camera mounted on a vehicle capturing video images at 30 frames per second. This setup distinguishes the system from similar systems which use either a multi camera setup or, alternatively, active devices such as RADAR or LADAR. A single camera system is more reliable and simpler than any of these methods. A dual (or more) camera setup as employed by Toyota in some Lexus models, provides more data to process, and therefore, more accurate results. However, it also carries severe processing and configuration overheads, which render it unsuitable for use in low power, low resource, embedded devices typically found in automobiles. It is also a much more expensive system to implement, for obvious reasons, than a single camera system. Active systems such as RADAR or LIDAR require signals to be reflected from targets, this leaves them susceptible to interference, possibly from other identical systems approaching them. Mounting these active systems to cars can difficult, and most importantly, carry a significant financial expense. Often, a front facing optical camera fitted to a car can have several uses, the same camera can be used for lane detection and road sign detection as well as distance determination, providing a comprehensive security package using a single camera. Previously, computer vision algorithms were much too computationally heavy to implement in real time on low power devices. Devices such as the Intel Atom processor, which are very capable, yet consume very little power, can make implementation of these types of algorithm in real time a reality. In conclusion, the passive nature, flexibility and simplicity of a single camera set up makes it well suited to implementation in an automobile environment. The proposed system is capable of providing life saving information to drivers.
4
Figure 1 - Overview of proposed system
The two figures below illustrate the problem explicitly. As the vehicle in front approaches the camera, vertical distance in the image, as indicated by the white lines, does not vary in a linear fashion. The third image has been transformed to a top down view, now distance on the image is linearly related to distance in the real world.
Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance
Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds linearly to real distance
3 Background Technologies
This chapter illustrates the technology background of the project. It provides a brief overview of the field of computer vision and the technologies being used to implement the system.
3.2 OpenCV2
OpenCV stands for Open Computer Vision. OpenCV is a library of functions for image processing. Originally developed by Intel in 1999, it is now an Open Source project released under the BSD license. The purpose of OpenCV is to provide open and optimized code to support the advancement of The functions themselves are mostly written in C. The purpose of OpenCV is to provide robust, quick routines for standard image processing techniques. There are many commonly used techniques in image processing for computer vision applications. OpenCV provides implementations of many of these functions allowing for rapid algorithm implementation. While implementing algorithms, the programmer doesnt have to continually reinvent the wheel. Although OpenCV itself is an open source project, Intel provides a product named Integrated Performance Primitives, a commercial package of highly optimized routines which OpenCV can use in place of its own routines to speed up computation.
1 2
http://www.darpa.mil/grandchallenge/index.asp http://opencv.willowgarage.com/wiki/
7 For this particular project, OpenCV greatly accelerated development by providing routines to threshold images, generate homography matrices and sample regions of images. Although wrappers for the OpenCV libraries have been developed for high level languages such as C# and Python, code for this system was written solely in C, and compiled using the standard gcc compiler. Using a lower level language like C increases performance, leading to a real time, or close to real time implementation. 3.2.1 Useful OpenCV Functions
3.2.1.1 cvThreshold Performing a threshold of an image is a fundamental image processing technique. It involves examining the intensity values of each pixel in an image, and performing a particular operation depending on this value. OpenCV provides extensive thresholding options via the cvThreshold function. Several different types of thresholding are available; CV_THRESH_BINARY CV_THRESH_BINARY_INV CV_THRESH_TRUNC CV_THRESH_TOZERO_INV CV_THRESH_TOZERO
Most pertinent to this project are CV_THRESH_TOZERO, CV_THRESH_TOZERO_INV and CV_THRESH_BINARY. CV_THRESH_TOZERO If a pixel is below the threshold value, it is given an intensity value of 0. Otherwise it is not affected. CV_THRESH_TOZERO_INV The opposite of above, if a pixel is above the threshold value, it is given an intensity of 0. Otherwise it is not affected. CV_THRESH_BINARY As the name suggests, depending on which side of the threshold value the pixel intensity value lies on, it is assigned a value of 0, or 255.
8 3.2.2 cvSetImageROI
cvSetImageROI allows the programmer to perform operations on specified areas of an image. This allows functions such as thresholding, averaging, smoothing etc to be applied to one part of an image, while leaving the rest of the image the same, this function has proven very useful throughout the project. 3.2.3 cvWarpPerspective
cvWarpPerspective, as the name suggest, allows the perspective of an image be warped based on a transformation matrix. This has been the most important function provided by OpenCV for this project. It allows points to be mapped from one perspective to another through the use of a single function, greatly reducing programmer overhead. 3.2.4 Drawing Functions
OpenCV provides numerous drawing functions which allow feedback to be given to the user easily by overlaying shapes and text onto images.
10 As can be seen from the figure above, mapping a point from I(u,v) involves a rotation about the angle and a translation along the line of sight of the camera. The matrix mathematics are shown below in Equation 1 [5]
Equation 1 Matrix Mathematics for transforming point from image coordinates to world coordinates
We use homogenous co ordinate systems as it allows us to represent points at infinity, e.g. vanishing points. They also allow us to include division into linear equations, as well as constants.[6]
Equation 3 - Demonstration of benefits of homogenous coordinates
Using the Transformation Matrix above, we can map any point in the plane I(u,v) to its corresponding point in the real world plane W(X, Y, Z) with Z = 0. The transformation results in images in which any point with a Z coordinate greater than zero is incorrectly mapped, resulting in distortion of the image, as can be seen in the image below. For our purposes, an image similar to the one in the figure below is perfectly sufficient to determine distances.
11
Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of scene. image.
12
4 Project Structure
This chapter describes the structure of work carried out during the course of the project and the development of the system. It describes an overall flow for the system before explaining in detail the logic and implementation behind each step in the system. This project was split into 5 distinct goals to measure progress. Those goals were;
Goal 1 Commission the OpenCV system to load frames of video into memory. Sample the image pixels of the road directly in front of the vehicle. Threshold based on these sample pixels to remove the road surface from the image. The only remaining pixels are that of the sky, edges of the road (such as trees and buildings), road markings and other vehicles. Generate the IPM characteristic matrix which is developed from the height of the camera from the ground, and the angle at which the camera is mounted (w.r.t. the ground plane). Goal 2 Transform the image from the original input view to the IPM transformed view. This is done by applying the IPM matrix developed in the previous milestone. Goal 3 Determine the distance to the vehicle in front, by looking for the first pixels of non-road directly in front of the vehicle. One the position is determined; it is then possible to calculate the distance. Goal 4 Display this information to the driver by overlaying graphics on the original image to clearly indicate the distance to the vehicle in front.
13 Goal 5 Modify the system to run in real time with a video stream. If possible, this could be achieved in real-time using a video camera and a laptop (or other available embedded signal processing hardware).
14
15
Capturing of video frames is done using OpenCVs file capture or camera capture functions. Using these functions we create a CvCapture object from which we can query for frames from either the video, or camera connected to the laptop. 4.3.2 First Frame Operations
This section of the system consists of several operations, which need only be carried out once. The data generated through these operations can then be used over and over for each frame of video that is sampled. Many of the methods used in this project are computationally heavy to do on the fly, so as much work as possible is done at the beginning of the program. This way the stored values can be used repeatedly for each frame, saving on computation time. 4.3.2.1 Capture source points Firstly, 4 source points are captured from the user. These points are stored in an array, and used for the next step, generation of the transformation matrix. This is done using a mouse handler to return the position of user selected points in the image. 4.3.2.2 Generate transformation matrix The transformation matrix is the key to the Inverse Perspective Mapping algorithm. Helpfully, OpenCV provides a function to generate this matrix without needing to manually carry out the mathematical operations listed in the Inverse Perspective Mapping section above. The function cvGeneratePerspectiveTransform takes an array of points from the source image, and generates the transformation matrix that maps them to an array of points in the destination (top down) image. The simplified matrix mathematics is illustrated in the equation below.
The source points that are chosen by the user by clicking points in the image, map to a square in the destination image. If we lay a square shaped object on the road in front of the camera, we can use the corners of the square to generate the appropriate transformation matrix for the current environmental conditions.
16 Importantly, applying this operation in the other direction gives us the inverse transformation matrix. This allows us to map points back from the destination image back to the source image. 4.3.3 Threshold Image
In order to detect a vehicle on the road in front of us, we need to be able to discern what part of an image is the road, and what part is a vehicle. A solution to this problem is to threshold the image to remove road pixels. That way, anything in front of the car with an RGB intensity value greater than zero is an object in front of the car. The major difficulty that this presents is, given a certain image, how can one detect what is a road pixel and what is not? Roads vary in shade depending on the time of day, weather etc. To obtain a value for the particular scene we are working on, we can sample the value of pixels directly in front of the vehicle. We take a small patch of road slightly in front of the car and obtain an average value for the pixels across that patch. This gives a good value for the RGB characteristics of the road surface. This process is somewhat rough, and could be improved by instead taking the median value for all pixels, thus eliminating noise values generated by the system. Adaptive thresholding, that is, using different thresholding values for different areas of the image, could be implemented to improve the process further. We threshold based on these values. Thresholding works as follows; Split image into its constituent channels (R, G, B) Use built in OpenCV threshold functions CV_THRESH_TOZERO to remove pixels above and below the threshold value This leaves us with images containing just the road surface in the R, G and B planes We subtract the original R, G and B images from the images containing just the road surface and merge the result Finally we perform a binary threshold to leave any non road values with a high value to aid in distance determination An example of the thresholding algorithm at work can be seen in the following three figures. Fig. 8 shows a sample source image prior to application of the algorithm. Fig. 9 shows the result of
17 removal of road pixels from the image. Finally, Fig. 10 shows how non road objects are highlighted to allow for easier detection.
Figure 9 - Thresholded image of same scene as figure above with road pixels removed
18
Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted
As can be seen from the example above, the thresholding algorithm is very effective. Most of the road surface has been removed, leaving the object in front clearly detectable due to its bright colouring. 4.3.4 Warp Perspective
As part of the first frame operations, we generated the transformation matrix based on 4 source points in the image mapped to 4 destination points in a transformed image. Now we need to transform the video frame based on this transformation matrix. OpenCV provides a function to transform an image based on a 3x3 homography matrix. This is the matrix we generated earlier. Application of this function to an image results in a transformation similar to the one shown in the figures below. Warping the perspective of an image involves considerable computational overhead. It is therefore pertinent to use the operation as sparingly as possible.
19
Figure 11 Sample road scene image before perspective is warped using transformation matrix
Figure 12 Previous figure after perspective has been warped using transformation matrix
Note Image samples above are not thresholded as they would be during real operation of the system. They are given to illustrate clearly the effects of the perspective transform.
Now that we have warped the image to a top down view, we can measure the distance to the vehicle in front linearly. This consists of several steps.
Figure 13 Transformed image of sample road scene, ready for object detection.
We know the coordinates of the front of our car and so, we loop vertically upwards through the image, working on a small rectangle in the area in front of the car. For each small rectangle, we average the pixel values across the rectangle. We know that all road pixels are zero, thanks to the threshold applied earlier, so we increment the position of the rectangle until the average function returns a non-zero
21 value. This value is stored in a global variable that is accessible from the main body of code and corresponds to the distance between our vehicle and the object directly in front. Now we have the position of the object directly in front of the car in its transformed image coordinates. We transform this point back to the original coordinate system in order to overlay feedback to the user. 4.3.6 Provide Graphical Overlay
Now that we have a value for the position of the object in co ordinates relevant to the original image, we can overlay graphics and text for display to the user. The value that we have calculated for distance is in pixels. It is the distance in pixels from the base of the image (front of our vehicle) to the next object in front of our vehicle. This value will change in a linear fashion as the real world distance changes, e.g. a value of 700 pixels will equate to twice the distance 350 pixels equates to. It is this linearity that is the strength of the Inverse Perspective Mapping algorithm. In order to display an accurate value for distance, we need to know the scaling factor. This is the value in pixels that corresponds to one meter. This value can calibrated for a particular camera configuration by placing a meter stick in front of the car flat on the road surface and measuring the number of pixels that this corresponds to in the top down view. This value will stay the same as long as the characteristics of the camera (its height from the ground, focal length etc) remain the same. For the purposes of testing using videos from several different sources, an approximate scaling factor was chosen based on analysis of several sample top down images. The value was calibrated based on the fact that the standard for Irish lane markings is approximately 1.5m. Graphical information provided to the user is in the form of the distance figure being displayed in the upper left quadrant of the image, typically where very little activity takes place. A rectangular box is also drawn around the detected area to verify that the correct object has been detected. The figure below shows a sample overlay of information to the user.
22
Figure 14 Source image with overlay of rectangular box and distance value
4.3.7
While most design, testing and processing was carried out on pre recorded video, a very important goal had also to be realised. For the algorithm to function in a real time system, it had to be made possible to not only work on pre recorded video, put live real time data from a camera also. This involved implementation of a mechanism by which the program would read frames from a camera rather than a file. If the system is invoked with one command line argument, it attempts to load the video at the path specified by the argument. If the system is started without a command line argument, the program will attempt to query frames from a camera attached to the system.
23
24
Figure 15 - Illustrates mapping of points in source top down image to points in front facing image
25 be imperceptible to a user. Therefore we can alter the sampling rate to calculate distance less frequently, and therefore save on resources with little to no visible difference during operation.
26
As can be seen from the image above on the left, in which a range of values 1 the threshold values were applied, we need a relatively large range to accurately remove the entire road surface. The image on the right applied a threshold range of 100, which is clearly too large as we have removed much of the vehicle in front along with the road.
In the figure above some of the detail in the bottom part of the vehicle in front of ours has been removed along with the road. This is not a problem as we will now apply a binary threshold to highlight any values that are above zero.
28
Using a very low value, as shown above, results in very small levels of noise triggering object detection. Conversely, using a very high value leads to no object detection at all. An appropriate value was found through measuring average values across the detection rectangle and discerning a threshold value from these measurements.
29
cvCreateImage cvCreateImageHeader cvCreateMat cvCreateMatND cvCreateData cvCreateSparseMat cvCreateMemStorage cvCreateGraphScanner cvOpenFileStorage cvAlloc
cvReleaseImage cvReleaseImageHeader cvReleaseMat cvReleaseMatND cvReleaseData cvReleaseSparseMat cvReleaseMemStorage cvReleaseGraphScanner cvReleaseFileStorage cvFree
Once these rules were observed strictly, memory usage declined drastically. Below is a table illustrating measurements of memory use for several sample videos, after memory use was cut.
30
Memory Usage
16 14 12 10 Memory (MB) 8 6 4 2 0 1 2 3 4 5
5.7 Calibration
In order to obtain accurate distances from the system, calibration for a particular environment needs to be carried out. There are several ways in which this can be done. When the system is installed in a vehicle, the position and angle of the camera will not change, which means that instead of the approximate method employed for testing in the current implementation, we can employ a more accurate method of calibration, which will give more accurate distance values. One such method of calibration is to calibrate the camera by placing a square object of known size in front of the camera on the road plane. Using simple mouse clicks, the transformation matrix for that environment can be obtained. Due to the wide variety of samples from different environments being tested as part of this project, this is the method that has been implemented. This method provides a somewhat rough value, but is satisfactory for testing purposes. The figure below shows a scene in which a rectangular shape has been overlain, by clicking the corners of this rectangle, the transformation matrix for this environment can be found.
31
A second method that can be employed to calibrate the camera and generate the transformation matrix is through the use of the cameras intrinsic and extrinsic characteristics. A cameras focal length, height above the ground, and viewing angle can be used. Exploration of this method of calibration is outside the scope of this project. Finally, it is possible to perform automatic calibration of the camera using a checkerboard patterned square placed in front of the car. This is the preferred option, providing simple and reliable calibration. Firstly, the system detects the checkerboard pattern of known size [11]. From this information the transformation matrix can be generated. This technique has not been implemented as part of this system.
32
6 Results
This section illustrates results obtained by the system as implemented. Insight is given into situations in which the algorithm is effective and where it can be improved. Naturally the easiest way to evaluate the effectiveness of a system is to test it on a variety of different conditions. It was found that the algorithm functioned as expected for most all sample videos. Below are screenshots of several videos in which the algorithm functioned as expected.
Areas in which the algorithm did not perform as expected will be explored in the Further Work chapter.
33 A standard camera captures frames at a rate of 30 frames per second. Each frame is displayed for 33 milliseconds. It is not feasible to run the entire algorithm on each frame, 30 times per second. To improve performance and lessen the load on the system, we run the algorithm less than 30 times per second, every x frames. Below is a chart of computation times for the algorithm on a 386 frame video stream, at different sampling rates.
A sampling rate of 10 frames was chosen as a good tradeoff between accuracy and updating frequency.
34
The chart below shows a comparison on the time taken by each step to process all frames of a 386 frame video.
35
Carrying out all operations on each frame was found to take a total of 0.053 seconds or 53 milliseconds. This figure is not inherently very useful to know, as it is relative to the processor on which the test is being carried out. It is greatly affected by the presence of other processes running on a system. Testing the algorithm on a dedicated microprocessor would give more quantitative bench marks. What can be inferred from the figures obtained are figures for the percentage of total time taken up by each part of the system. To generate these values, the algorithm was modified to run only one of the 3 major operations listed above. Timing was then carried out using the built in Linux command time [9], which measures real time taken as well as user and kernel time taken to execute programs. The results were corroborated with a second timing method, using built in clock functionality in the C language [10]. Below is an illustration of the percentage of total execution time taken up by each of the 3 major parts of the algorithm.
36
As can be seen from the figure above, as predicted, warping perspective of the image in order to generate the top down view of the scene is by far the most time consuming part of the algorithm. The second most computationally expensive operation is thresholding of the image, again as expected. Thresholding an image works on the whole image, altering each pixel based on a rule. In this system, this is done several times, resulting in significant processing time.
37
7 Further Work
While the system is very successful in determining distances and detecting objects in front of a vehicle, it stands to be improved in several areas. 7.1.1 Processing Time
Currently each processed frame requires on average, 0.05 seconds of processor time. This figure can be improved upon in a number of ways; Reduce number of channels in image to be transformed, to one. This will have the effect of reducing the computation required to transform the (3 channel) image being transformed to 1, and should provide a drastic increase in performance. When thresholding image, only threshold the portion required by the algorithm. Currently thresholding is applied to the whole image, this is not required as some parts of the image, e.g., the horizon and the bonnet of the car are irrelevant and unimportant. Cropping these areas out will increase efficiency. Change thresholding algorithm to use less memory. In the system, as part of the thresholding operation, several extra data structures are allocated and de allocated. This slows down computation and increases the amount of memory used. A more efficient algorithm using fewer resources would improve overall processing time. Implement tracking algorithm The sample rate could be further reduced from 3 times per second with the help of a tracking algorithm. 7.1.2 Environmental Conditions
The algorithm in its current form is quite susceptible to changes in environment, e.g., going from bright areas to dim areas. This aspect of the system could be improved using adaptive thresholding.
38 Secondly, the system detects road markings in the middle of the road as objects, which interferes with distance detection. The system could be improved to intelligently filter out these markings and improve reliability of the algorithm 7.1.3 Tracking
Implementing tracking as part of the system would greatly improve the algorithm in several ways. By the nature of the environment where the system operates, there is little change in the location of the detected object from one frame to another. A tracking algorithm could assist in situations where the algorithm has lost the object or has been compromised by noise conditions on the road. 7.1.4 Embedded Implementation
It is very much hoped that the system will be ported to an embedded processor in the near future where it can be properly tested and benchmarked for use in an actual vehicle. Manufacturer specific high performance C libraries such as the Intel Performance Primitives could be employed to greatly increase performance.
39
8 Conclusion
As can be seen from the successful implementation of this algorithm in the C language, a real time distance determination system using OpenCV is clearly achievable. The system as it stands is functional and complete. Refinements are needed before the system can be deployed with confidence to an actual embedded device, but indications are positive that this will be possible. OpenCV has proven a powerful and lightweight computer vision framework and greatly assisted in the development of the project. A real time, single camera, passive distance determination algorithm as implemented here could have a positive effect on road safety and avoidance of road collisions. The use of a single optical camera, which can have many purposes in a single installation, makes it an attractive proposition for car manufacturers due to its low cost and simple configuration. This system offers benefits over similar active systems in terms of both cost and functionality, in that its object detection is not solely limited to metal, reflective objects. For normal road conditions the algorithm was found to function very well, providing useful information to the user. This information could then be integrated into the vehicles operation in several ways; by alerting a user of imminent danger; alerting a user that they are not maintaining a safe following distance in relation to the car in front; and by performing pre crash safety procedures if an impending collision. All of these benefits combine to make a vehicle which implements this system a safer one which ought to lead to fewer road accidents and fewer injuries or fatalities.
40
9 References
1. Road Safety Authority Road Collision Facts 2005 (http://www.rsa.ie/publication/publication/upload/2005%20Road%20Collision%20Facts.pdf) 2. Mercedes Pre Safe (http://www2.mercedesbenz.co.uk/content/unitedkingdom/mpc/mpc_unitedkingdom_website/en/home_mpc/passengercars/ho me/new_cars/models/cls-class/c219/overview/safety.html) 3. Maud, Hussain, Samad et al. 2004. Implementation of Inverse Perspective Mapping Algorithm For The Development Of An Automatic Lane Tracking System 4. 5. 6. 7. 8. 9. Mallot et al. 1991. Inverse perspective mapping simplifies optical flow computation and obstacle detection D. O Cualain, C. H. 2009. Lane Departure Detection Using Subtractive Clustering in the Hough Domain. Paul Smith, NUIG Guest Lecture. Applications of Linear Algebra: Computer Vision in Sports Intel Performance Primitives (http://software.intel.com/en-us/intel-ipp/) Intel Atom processor (http://www.intel.com/technology/atom/) time command (http://linux.about.com/library/cmd/blcmdl1_time.htm)
10. Timing in C (http://beige.ucs.indiana.edu/B673/node104.html) 11. Learning OpenCV Computer Vision with the OpenCV libraries. Gary Bradski, Adrian Kaehler. 2008. O. Reilly Media.
41
10 Appendix A - On the CD
Included on the submitted CD is the entirety of the Subversion repository of code developed throughout the course of the project. The code is split into various folders with snippets to carry out different parts of the algorithm. The final implementation, which incorporates many of the separate parts can be found in the Final Implementation folder. Some sample images and videos are included for testing purposes.