Sei sulla pagina 1di 45

Eye Blinks

Abstract This graduation project aims to present an application that is able of replacing the traditional mouse with the human face as a new way to interact with the computer. Facial features (nose tip and eyes) are detected and tracked in real-time to use their actions as mouse events. The coordinates and movement of the nose tip in the live video feed are translated to become the coordinates and movement of the mouse pointer on the users screen. The left/right eye blinks fire left/right mouse click events. The only external device that the user needs is a webcam that feeds the program with the video stream. In the past few years high technology has become more progressed, and less expensive. With the availability of high speed processors and inexpensive webcams, more and more people have become interested in real-time applications that involve image processing. One of the promising fields in artificial intelligence is HCI(Human Computer Interface.) which aims to use human features (e.g. face, hands) to interact with the computer. One way to achieve that is to capture the desired feature with a webcam and monitor its action in order to translate it to some events that communicate with the computer. In our work we were trying to compensate people who have hands disabilities that prevent them from using the mouse by designing an application that uses facial features (nose tip and eyes) to interact with the Computer. The nose tip was selected as the pointing device; the reason behind that decision is the location and shape of the nose; as it is located in the middle of the face it is more comfortable to use it as the feature that moves the mouse pointer and defines its coordinates, not to mention that it is located on the axis that the face rotates about, so it basically does not

change its distinctive convex shape which makes it easier to track as the face moves. Eyes were used to simulate mouse clicks, so the user can fire their events as he blinks. EXISTING SYSTEM While different devices were used in HCI (e.g. infrared cameras, sensors, microphones) we used an off-the-shelf webcam that affords a moderate resolution and frame rate as the capturing device in order to make the ability of using the program affordable for all individuals. PROPOSED SYSTEM To present an algorithm that distinguishes true eye blinks from Involuntary ones, detects and tracks the desired facial features precisely, and fast enough to be applied in real-time. SYSTEM SPECIFICATION Operating Systems: XP sp2 , 2003 Pentium P4 Processors or better 1 GB of RAM [Required]

JDK 1.5 or more. JMF 2.x version and above


30 frames supportable web camera.

MODULES

Facial features (nose tip and eyes) are detected and tracked in realtime to use their actions as mouse events. Nose tip movements are lively fed and translated to become the movement of mouse pointer or cursor. The left/right eye blinks replaces left/right mouse click events.

Face Detection

In this module, we propose a real-time face detection algorithm using SixSegmented Rectangular (SSR) filter, distance information, and template matching technique. Between-the-Eyes is selected as face representative in our detection because its characteristic is common to most people and is easily seen for a wide range of face orientation. Firstly, we scan a certain size of rectangle divided into six segments throughout the face image. Then their bright-dark relations are tested if its center can be a candidate of Between-the-Eyes. Next, the distance information obtained from stereo camera and template matching is applied to detect the true Between-the-Eyes among candidates. We implement this system on PC with Xeon 2.2 GHZ. The system can run at real-time speed of 30 frames/sec with detection rate of 92%. The current evolution of computer technologies has enhanced various applications in human-computer interface. Face and gesture recognition is a part of this field, which can be applied in various applications such as in robotic, security system, drivers monitor, and video coding system. Since human face is a dynamic object and has a high degree of variability, various techniques have been proposed previously. Based on the survey of Hjelmas [1], he has classified face detection techniques into two categories: featurebased approach

and image-based approach. The techniques in the first category makes used of apparent properties of face such as face geometry, skin color, and motion. Even feature-based technique can achieve high speed in face detection, but it also has problem in poor reliability under lighting condition. For second category, the imagebased approach takes advantage of current advance in pattern recognition theory. Most of the imagebased approach applies a window scanning technique for detecting face [1], which requires large computation. Therefore, by using only imagebased approach is not suitable enough in real-time application. In order to achieve high speed and reliable face detection system, we propose the method combine both feature-based and image-based approach to detect the point between the eyes (hereafter we call it Between-the-Eyes) by using Six-Segmented Rectangular filter (SSR filter). The proposed SSR filter, which is the rectangle divided into 6 segments, operates by using the concept of bright-dark relation around Between-the-Eyes area. We select Between-the-Eyes as face representative because it is common to most people and easy to find for wide range of face orientation [2]. Between-the-Eyes have dark part (eyes and eyebrows) on both sides, and have comparably bright part on upper side (forehead), and lower side (nose and cheekbone). This characteristic is stable for any facial expression [2]. In this paper, we use an intermediate representation of image called integral image from Viola and Jones work [3] to calculate sums of pixel values in each segment of SSR filter. Firstly, SSR filter is scanned on the image and the average gray level of each segment is calculated from integral image. Then, the bright-dark relations between each segment are tested to see whether its center can be a candidate point for Between -the- Eyes. Next, the stereo camera is used to find the distance information and the suitable Between-the- Eyes template size. Then, the Betweenthe-Eyes candidates are evaluated by using a template of Between-the-Eyes (obtained from 400 images of 40 people from ORL face database [4]) matching technique. Finally the true Between-the-Eyes can be detected.

The proposed technique gains advantage of using only the gray level information so it is more reliable for changes of lighting conditions. Moreover, this method is also not affected by beards, mustaches, hair, or nostril visibility, since only the information around eyes, eyebrows and nose area is required. We implement this system on PC with Xeon 2.2 GHz CPU. The system can run at 30 frames/sec with detection rate of 92%. In Section 2, we describe the concept of integral image followed by the explanation of using SSR filter to extract Between-the-Eyes candidates in Section 3. For Section 4, we explain the candidate selection method by using stereo camera and average Between-the-Eyes template matching technique. Then in Section 5, the whole system of real-time face detection system is shown. The experimental results are shown in Section 6 and end up with conclusion in Section 7. Integral Image The SSR filter is computed by using intermediate representation for image called integral image. For the original image i(x, y), the integral image is defined as [3]

The integral image can be computed in one pass over the original image by the following pair of recurrences. s(x ,y ) = s(x , y 1) + i(x ,y) (2)

ii(x ,y ) = ii(x - 1, y) + s(x ,y ) (3) Where s(x ,y ) is the cumulative row sum, s(x , -1) = 0, and ii(-1, y) = 0. Using the integral image, the sum of pixels within rectangle D (rs) can be computed at high speed with four array references as shown in Fig.1.

sr = (ii (x ,y ) + ii(x - W, y - L)) - (ii (x - W, y) + ii(x, y - L )) (4)

Figure 1. Integral Image

SSR filter 1 SSR filter At the beginning, a rectangle is scanned throughout the input image. This rectangle is segmented into six segments as shown in Fig.2 (a).

Figure 2. SSR Filter

We denote the total sum of pixel value of each segment (B1 B6) as 1 6 b b S S . The proposed SSR filter is used to detect the Between-the-Eyes based on two characteristics of face geometry. (1) The nose area ( n S ) is brighter than the right and left eye area ( er S and el S , respectively) as shown in Fig.2 (b), where Sn = Sb2 + Sb5 Ser = Sb1 + Sb4 Sel = Sb3 + Sb6 Then, Sn > Ser Sn > Sel (5) (6)

(2) The eye area (both eyes and eyebrows) ( e S ) is relatively darker than the cheekbone area (including nose) ( c S ) as shown in Fig. 2 (c), where Se = Sb1 + Sb2 + Sb3 Sc = Sb4 + Sb5 + Sb6 Then, Se < Sc
(7)

When expression (5), (6), and (7) are all satisfied, the center of the rectangle can be a candidate for Between-the-Eyes.

Figure 3. Between-the-Eyes candidates from SSR filter In Fig.3 (b), the Between-the-Eyes candidate area is displayed as the white areas and the non-candidate area is displayed as the black part. By performing labeling process on Fig. 3 (b), the result of using SSR filter to detect Between-the-Eyes candidates is shown in Fig. 3 (a). 2 Filter Size Estimation In order to find the most suitable filter size, we use 400 facial images of 40 people, i.e., 10 for each from ORL face database [4]. The images were taken at different time, under various lighting condition, at different gesture, and with and without eyeglasses. Each image size is 92112 with 256 gray levels. We perform filter size estimation manually for all 400 facial images to find the standard filter size, which covers two eyes, two eyebrows, and cheekbone area (including nose). The result is estimated to be a rectangle of size 6030 pixels. In the experiment, we counted whether a true candidate is included or is in vicinity area. By varying the standard filter size of 6030 by 20%, the true candidate point detection rate and the number of candidate of each filter size are shown in Table 1. The standard filter size of 6030 can obtain 92% detection rate, which prove that this filter size can function effectively. On the other hand, the detection rate becomes worse (52%), when we use the filter of size 8442 pixels, because large filter size may include some unnecessary parts of face such as hair or beard. Since the sum of pixel value is used in expression (5), (6), and (7), the filter of size 2412

and 3618 as shown in Fig. 4 can achieve unexpected high detection rate of Betweenthe- Eyes even these filter sizes do not completely contain both eyes area, because only some parts of eyes are still darker than nose area. Fig. 5 is the examples of some successful Between-the-Eyes detections, where some of failures are shown in Fig. 6. These detection errors may cause by the illumination. The detection failure of the middle image in Fig. 6 is mainly influenced by the reflection on the eyeglasses.

Figure 4. Various size of SSR filter

Figure 5. Examples of successful Between-the-Eyes detection

Figure 6. Examples of failures in Between-the-Eyes detection

Moreover Fig. 7 is the example of successful Between-the-Eyes detection for image, which has horizontal illumination hits on one side of face. In this case, SSR filter can also function effectively even if one side of face is covered by shadow. Therefore SSR filter can be used to detect Betweenthe- Eyes under variations of lighting condition.

Table 1. Detection results of various SSR filter sizes (from 400 face images)

Figure 7. Example of successful Between-the-Eyes detection for face, which has illumination hits on one side According to Table 1, the rectangular of size 0.6~1.2 times of standard size (6030) can be used to detect candidate of Between-the-Eyes. Therefore various size of face image from 0.83~1.67 times of the standard image (92112 pixels) can be detected by our proposed SSR filter.

Candidate Selection 1 Stereo camera

In real situation, face size varies according to the distance from face to cameras. We use two cameras to construct a binocular stereo system to find the distance information, so that a suitable size of Between-the-Eyes template can be estimated for further template matching technique discussed later in Section 4.2. Since the stereo camera system is the general process, the detail explanation is omitted in this paper. We performed the experiments to find the suitable size of Between-the-Eyes template by using the difference among right and left images based on the principle of binocular stereo camera system. Firstly, we measured the horizontal different in pixel between the Between-the-Eyes of face image obtained from right and left cameras manually. Then the width between the right and left temples is manually measured, which should be corresponded to the width of the template of Betweenthe-Eyes. The relation between disparities and suitable templates sizes of Between-the-Eyes is shown in Fig.8. Based on this relation, we can select an appropriate size of the template according to the measured disparity in an actual scene. This is why our proposed technique is applicable to faces at various distances between 0.5-3.5 m. from the cameras. From experiments and relation in Fig.8, we can find relations between SSR filter size, disparity, and size of Between-the-Eyes template as shown in Table 2. Only two filter size: 4020 and 2412 are used since they are flexible enough to detect face within pre-defined range. For example, face of disparity equal to 20, the SSR filter of size 4020 is used and the template size of Between-the-Eyes is 4824 pixels. Then the template is scaled to match the average Between-the-Eyes template size for template matching technique. For the face of disparity outside the range shown in table 2 is assumed to be undetectable.

Figure 8. The relation between the horizontal differences in pixel (disparity) and the Between-the-Eyes template size

Table 2. Filter size, disparity, and related Between-the-Eyes template size 2 Average Between-the-Eyes Template Matching Because the SSR filter extracts not only the true Between-the- Eyes but also some false candidates, so we use the average Between-the-Eyes template matching technique to solve this problem. The average Between-the-Eyes pattern used in this paper obtained in the same manner as [2] from 400 face images of 40 people from ORL face database [4].

Figure 9. Average Between-the-Eyes template and its variance pattern Fig. 9 is the average Between-the-Eyes template and its variance pattern of size 3216. The gray levels of each sample were normalized to have average gray level equal to zero and variance equal to one. Then we calculated an average pattern and its variance at each pixel. Next, the gray level was converted to have the average level equal to 128 with standard deviation of 64. Then we can get the average

pattern as an image. To obtain the variance pattern, each value was multiply by 255. Both average and variance pattern are symmetry. To avoid the influence of unbalanced illumination, we evaluate the right and left part of face separately because lighting condition is likely different between right and left half of face. Moreover, we also avoid the affect of hair and beard, and reduce calculation load by discard the top three rows from the calculation. At the end, the pattern of 1613 pixels (for one side) is used in template matching. Define the average Between-the-Eyes template and its variance for left side of face as , and for the right side as t1ij, v1ij (i=0,...,15, j = 3, ...., 15) and trij, vrij (i=0,...,15, j = 3, ...., 15). trij and t1ij have average value of 128 with standard deviation of 64, where vr and v1 represent maximum gray level equal to 255. To evaluate the candidates, we define the Betweenthe- Eyes pattern as p mn (m=0,...,31, n = 0, ...., 15) . Then right and left half of p mn is re-defined again separately as prij (i=0,...,15, j = 3, ...., 15) and p1ij (i=0,...,15, j = 3, ...., 15), respectively, each has been converted to have average value of 128 and standard deviation of 64. Then the left mismatching value (Dl) and the right mismatching value (Dr) are calculated by using the following equation.

Only the candidate with both Dl and Dr less than pre-defined threshold ( D ) is counted as the true candidate. For the case of more than one candidate has bothDl

and Dr less than threshold, the candidate with the smallest mismatch value is judged as the true Between-the-Eyes candidate. 3 Detection of Eye-Like Points Since Between-the-Eyes is located in the middle of left and right eye alignment, we perform detection of both eyes to confirm the location of the true Between-the-Eyes. When the locations of both eyes are extracted from the selected face area, the Between-the-Eyes is re-registered as the middle point among them. We search eyes area from Between-the-Eyes template obtained from Section 4.1. The eye detection is done in a simple way as a technique used in [5]. In order to avoid the influence of illumination, we perform the right eye and left eye search independently. Firstly, the rectangular areas on both side of the Between-the-Eyes candidate where the eyes should be found are extracted. In this paper, for the selected Between-the-Eyes area of size 3216, we avoid the affect of eyebrows, hair, and beard by ignore 1 pixel at boarder. Then both eyes areas are assumed to be at 1214 pixels on each side of face (neglect three pixel in the middle of Betweenthe- Eyes template as nose area). Next, we find the threshold level for each area to binarize the image. The threshold level is determined when the sum of the number of pixels of all components except the boarder exceeds a pre-defined value [6] (10 in this paper). In some case, the eyebrows have almost the same gray level as the eyes. So we select the area within a certain range of pixels (5~25 pixels) with the lowest position. To solve the problem in similarity of gray level of eyes and eyebrows, the searching process using the concept of left and right eye alignment is performed. The range of this process focuses on the 33 pixels in the middle of both eyes area. Then condition of the distance between the located eyes ( De) and the angle ( Ae) at

Between-the-Eyes candidate are tested using the following expression. Both expressions are obtained from experiments. 15 < De < 21 115 < Ae < 180 (10) (11)

Only the candidate with eyes relation satisfies both condition is re-registered as the true Between-the-Eyes. Otherwise, the Between-the-Eyes and eyes area cannot determine. Real-Time Face Detection System The processing flow of Real-Time face detection system is shown in Fig. 10.

Figure 10. Processing Flow of Real-Time Face Detection Experiment We implement the system on PC with Xeon 2.2 GHz CPU. In the experiment, two commercial NTSC video cameras, multivideo composer, and video capture board without any special hardware is used. Two NTSC cameras are used to construct a binocular stereo system. The multi-video composer combines four NTSC video

signals into one NTSC signal. We use only two NTSC video signals from multivideo composer in our experiment. Each video image becomes one half of the original size. Therefore, the captured image size for each camera is 320240. However, to avoid the interlaced scanning problem for moving object, we use only even line data. Consequently, the image size is 320120 for each camera. The resulting horizontal image resolution is double of the vertical one as shown in the bottom two images in Fig. 11. We keep this non-uniform resolution to obtain as accurate disparity as possible. On the other hand, we need a regular image for applying template matching of Between-the-Eyes. Therefore we reconstruct a smaller image by sub-sampling as shown in uppermost-left image of Fig.11. Fig. 11 is the face detection result from the experiment performed in the laboratory with unspecified background. The uppermost-left image is a monochrome image of the right camera with only the green component. The Betweenthe Eyes detection is applied to this (160120) monochrome image. The lower image is the image obtained from the right camera, and the lowest image is obtained from the left camera.

Figure 11. Face Detection Result

The detection result from SSR filter is shown in the uppermost-right image. The upper corner is the Between-the Eyes candidate area after cutting and scaling to match the average matching template. Its binarized image of detected eyes and eyebrows after eye detection process is displayed below. Anyway, since no information in the inclination of face is used in SSR filter, this technique cannot be used to detect face with inclination larger than 10 . For the case of large reflection at eyeglasses, our proposed technique also failed to detect the true Between-the Eyes occasionally. In real implementation, the system can operate at 30 frames/sec, which achieve real-time processing speed. We propose a real-time face detection system consists of three major components: SSR filter, stereo camera system, and average Between-the-Eyes template matching unit. At the beginning, a SSR filter, in which bright-dark relations of average gray levels of each segments are tested if its center can be Between-the-Eyes candidate. At this point, we used integral image proposed by Viola [3] in SSR filter calculation in order to obtain real-time scanning of filter throughout the image. Since only gray information is used, our proposed technique is more reliable for changes of lighting conditions than skin color extraction methods. Next, stereo camera system is performed to find distance information so that the suitable size of Between-the-Eyes template can be estimated. This technique can be used to reduce calculation load and to detect faces of different size. Then we performed the average Between-the-Eyes template matching to select the true candidate, followed by the detection of both eye areas to verify our detection result. We implemented the system on PC with Xeon 2.2 GHz. The system ran at 30 frames/sec, which satisfied realtime processing speed. Anyway our proposed technique still has limitation in face orientation. Further development to solve this problem should be performed Hough Transform

Common Names: Hough transform Brief Description The Hough transform is a technique which can be used to isolate features of a particular shape within an image. Because it requires that the desired features be specified in some parametric form, the classical Hough transform is most commonly used for the detection of regular curves such as lines, circles, ellipses, etc. A generalized Hough transform can be employed in applications where a simple analytic description of a feature(s) is not possible. Due to the computational complexity of the generalized Hough algorithm, we restrict the main focus of this discussion to the classical Hough transform. Despite its domain restrictions, the classical Hough transform (hereafter referred to without the classical prefix) retains many applications, as most manufactured parts (and many anatomical parts investigated in medical imagery) contain feature boundaries which can be described by regular curves. The main advantage of the Hough transform technique is that it is tolerant of gaps in feature boundary descriptions and is relatively unaffected by image noise.

How It Works

The Hough technique is particularly useful for computing a global description of a feature(s) (where the number of solution classes need not be known a priori), given (possibly noisy) local measurements. The motivating idea behind the Hough technique for line detection is that each input measurement (e.g. coordinate point) indicates its contribution to a globally consistent solution (e.g. the physical line which gave rise to that image point). As a simple example, consider the common problem of fitting a set of line segments to a set of discrete image points (e.g. pixel locations output from an edge detector). Figure 1 shows some possible solutions to this problem. Here the lack of a priori knowledge about the number of desired line segments (and the ambiguity about what constitutes a line segment) render this problem under-constrained.

Figure 1 a) Coordinate points. b) and c) Possible straight line fittings.

We can analytically describe a line segment in a number of forms. However, a convenient equation for describing a set of lines uses parametric or normal notion:

where is the length of a normal from the origin to this line and is the orientation of with respect to the X-axis. (See Figure 2.) For any point are constant. on this line, and

Figure 2 Parametric description of a straight line.

In an image analysis context, the coordinates of the point(s) of edge segments (i.e. ) in the image are known and therefore serve as constants in the parametric line equation, while and are the unknown variables we seek. If we plot the possible values defined by each , points in cartesian image space map to

curves (i.e. sinusoids) in the polar Hough parameter space. This point-to-curve transformation is the Hough transformation for straight lines. When viewed in Hough parameter space, points which are collinear in the cartesian image space

become readily apparent as they yield curves which intersect at a common point. The transform is implemented by quantizing the Hough parameter space into finite intervals or accumulator cells. As the algorithm runs, each into a discretized is transformed

curve and the accumulator cells which lie along this curve are

incremented. Resulting peaks in the accumulator array represent strong evidence that a corresponding straight line exists in the image. We can use this same procedure to detect other features with analytical descriptions. For instance, in the case of circles, the parametric equation is

where and are the coordinates of the center of the circle and is the radius. In this case, the computational complexity of the algorithm begins to increase as we now have three coordinates in the parameter space and a 3-D accumulator. (In general, the computation and the size of the accumulator array increase polynomially with the number of parameters. Thus, the basic Hough technique described here is only practical for simple curves.)

Guidelines for Use The Hough transform can be used to identify the parameter(s) of a curve which best fits a set of given edge points. This edge description is commonly obtained from a feature detecting operator such as the Roberts Cross, Sobel or Canny edge detector and may be noisy, i.e. it may contain multiple edge fragments corresponding to a single whole feature. Furthermore, as the output of an edge detector defines only

where features are in an image, the work of the Hough transform is to determine both what the features are (i.e. to detect the feature(s) for which it has a parametric (or other) description) and how many of them exist in the image. In order to illustrate the Hough transform in detail, we begin with the simple image of two occluding rectangles,

The Canny edge detector can produce a set of boundary descriptions for this part, as shown in

Here we see the overall boundaries in the image, but this result tells us nothing about the identity (and quantity) of feature(s) within this boundary description. In this case, we can use the Hough (line detecting) transform to detect the eight separate straight lines segments of this image and thereby identify the true geometric structure of the subject. If we use these edge/boundary points as input to the Hough transform, a curve is generated in polar space for each edge point in cartesian space. The

accumulator array, when viewed as an intensity image, looks like

Histogram equalizing the image allows us to see the patterns of information contained in the low intensity pixel values, as shown in

Note that, although and are notionally polar coordinates, the accumulator space is plotted rectangularly with there are only 8 real peaks. Curves generated by collinear points in the gradient image intersect in peaks in as the abscissa and as the ordinate. Note that the

accumulator space wraps around at the vertical edge of the image such that, in fact,

the Hough transform space. These intersection points characterize the straight line segments of the original image. There are a number of methods which one might employ to extract these bright points, or local maxima, from the accumulator array. For example, a simple method involves threshold and then applying some thinning to the isolated clusters of bright spots in the accumulator array image. Here we use a relative threshold to extract the unique points corresponding to each of the

straight line edges in the original image. (In other words, we take only those local maxima in the accumulator array whose values are equal to or greater than some fixed percentage of the global maximum value.)

Mapping back from Hough transform space (i.e. de-Houghing) into cartesian space yields a set of line descriptions of the image subject. By overlaying this image on an inverted version of the original, we can confirm the result that the Hough transform found the 8 true sides of the two rectangles and thus revealed the underlying geometry of the occluded scene

Note that the accuracy of alignment of detected and original image lines, which is obviously not perfect in this simple example, is determined by the quantization of the accumulator array. (Also note that many of the image edges have several detected lines. This arises from having several nearby Hough-space peaks with similar line parameter values. Techniques exist for controlling this effect, but were not used here to illustrate the output of the standard Hough transform.) Note also that the lines generated by the Hough transform are infinite in length. If we wish to identify the actual line segments which generated the transform parameters, further image analysis is required in order to see which portions of these infinitely long lines actually have points on them. To illustrate the Hough technique's robustness to noise, the Canny edge description has been corrupted by 1% salt and pepper noise

before Hough transforming it. The result, plotted in Hough space, is

De-Houghing this result (and overlaying it on the original) yields

(As in the above case, the relative threshold is 40%.) The sensitivity of the Hough transform to gaps in the feature boundary can be investigated by transforming the image

, which has been edited using a paint program. The Hough representation is

and the de-Houghed image (using a relative threshold of 40%) is

In this case, because the accumulator space did not receive as many entries as in previous examples, only 7 peaks were found, but these are all structurally relevant lines. We will now show some examples with natural imagery. In the first case, we have a city scene where the buildings are obstructed in fog,

If we want to find the true edges of the buildings, an edge detector (e.g. Canny) cannot recover this information very well, as shown in

However, the Hough transform can detect some of the straight lines representing building edges within the obstructed region. The histogram equalized accumulator space representation of the original image is shown in

If we set the relative threshold to 70%, we get the following de-Houghed image

Only a few of the long edges are detected here, and there is a lot of duplication where many lines or edge fragments are nearly colinear. Applying a more generous relative threshold, i.e. 50%, yields

yields more of the expected lines, but at the expense of many spurious lines arising from the many colinear edge fragments. Our final example comes from a remote sensing application. Here we would like to detect the streets in the image

of a reasonably rectangular city sector. We can edge detect the image using the Canny edge detector as shown in

However, street information is not available as output of the edge detector alone. The image

shows that the Hough line detector is able to recover some of this information. Because the contrast in the original image is poor, a limited set of features (i.e. streets) is identified. Common Variants Generalized Hough Transform The generalized Hough transform is used when the shape of the feature that we wish to isolate does not have a simple analytic equation describing its boundary. In this case, instead of using a parametric equation of the curve, we use a look-up table to define the relationship between the boundary positions and orientations and the Hough parameters. (The look-up table values must be computed during a preliminary phase using a prototype shape.) For example, suppose that we know the shape and orientation of the desired feature. (See Figure 3.) We can specify an arbitrary reference point within the

feature, with respect to which the shape (i.e. the distance and angle of normal lines drawn from the boundary to this reference point ) of the feature is defined. Our

look-up table (i.e. R-table) will consist of these distance and direction pairs, indexed by the orientation of the boundary.

Figure 3 Description of R-table components.

The Hough transform space is now defined in terms of the possible positions of the shape in the image, i.e. the possible ranges of transformation is defined by: . In other words, the

(The and values are derived from the R-table for particular known orientations .) If the orientation of the desired feature is unknown, this procedure is complicated by the fact that we must extend the accumulator by incorporating an extra parameter to account for changes in orientation. Interactive Experimentation You can interactively experiment with this operator by clicking here. Exercises 1. Find the Hough line transform of the objects shown in Figure 4.

Figure 4 Features to input to the Hough transform line detector.

2. Starting from the basic image

create a series of images with which you can investigate the ability of the Hough line detector to extract occluded features. For example, begin using translation and image addition to create an image containing the original image overlapped by a translated copy of that image. Next, use edge detection to obtain a boundary description of your subject. Finally, apply the Hough algorithm to recover the geometries of the occluded features. 3. Investigate the robustness of the Hough algorithm to image noise. Starting from an edge detected version of the basic image

try the following: a) Generate a series of boundary descriptions of the image using different levels of Gaussian noise. How noisy (i.e. broken) does the edge description have to be before Hough is unable to detect the original geometric structure of the scene? b) Corrode the boundary descriptions with different levels of salt and pepper noise. At what point does the combination of broken edges and added intensity spikes render the Hough line detector useless? 4. Try the Hough transform line detector on the images:

and

Experiment with the Hough circle detector on

and

5. One way of reducing the computation required to perform the Hough transform is to make use of gradient information which is often available as output from an edge detector. In the case of the Hough circle detector, the edge gradient tells us in which direction a circle must lie from a given edge coordinate point. (See Figure 5.)

Figure 5 Hough circle detection with gradient information.

a) Describe how you would modify the 3-D circle detector accumulator array in order to take this information into account. b) To this algorithm we may want to add gradient magnitude information. Suggest how to introduce weighted incrementing of the accumulator.

6. The Hough transform can be seen as an efficient implementation of a


generalized matched filter strategy. In other words, if we created a template composed of a circle of 1's (at a fixed ) and 0's everywhere else in the image, then we could convolve it with the gradient image to yield an accumulator array-like description of all the circles of radius in the image. Show formally that the basic Hough transform (i.e. the algorithm with no use of gradient direction information) is equivalent to template matching. 7. Explain how to use the generalized Hough transform to detect octagons.

Hough transform The Hough transform is a feature extraction technique used in digital image processing. The classical transform identifies lines in the image, but it has been extended to identifying positions of arbitrary shapes. The transform universally used today was invented by Richard Duda and Peter Hart in 1972, who called it a "generalized Hough transform" after the related patent of Paul Hough. The transform was popularized in the computer vision community by Dana H. Ballard through a 1981 journal article titled "Generalizing the Hough transform to detect arbitrary shapes".

Theory To extract features from digital images, it is useful to be able to find simple shapes straight lines, circles, ellipses and the like - in images. In order to achieve this goal, one must be able to detect a group of pixel that are on a straight line or a smooth curve. That is what Hough transform supposed to do. The simplest case of Hough transform is Hough linear transform. To illustrate the idea, let's start with a straight line. In the image space, the straight in can be described as y = mx + b and is plotted for each pair of values (x,y). However, the charactistics of that straight line is not x or y, but its slope m and intercept b. Based on that fact, the straight line y = mx + b can be represented as a point (b, m) in the parameter space (b vs. m graph.) Using slope-intercept parameters could make application complicated since both parameters are unboundedAs lines get more and more vertical, the magnitudes of m and b grow towards infinity For computational purposes, however, it is better to

parameterize the lines in the Hough transform with two other parameters, commonly called r and (theta). The parameter r represents the smallest distance between the line and the origin, while is the angle of the locus vector from the origin to this closest point Using this parametrization, the equation of the line can be written as: r = x.cos + y.sin

It is therefore possible to associate to each line of the image, a couple (r,) which is unique if and , or if and . The (r,) plane is

sometimes referred to as Hough space. This representation makes the Hough transform to be conceptually very close to the so-called Radon transform. It is well known that an infinite number of lines can go through a single point of the plane. If that point has coordinates (x0,y0) in the image plane, all the lines that go through it obey the following equation: r() = x0.cos + y0.sin This corresponds to a sinusoidal curve in the (r,) plane, which is unique to that point. If the curves corresponding to two points are superimposed, the location (in

the Hough space) where they cross correspond to lines (in the original image space) that pass through both points. More generally, a set of points that form a straight line will produce sinusoids which cross at the parameters for that line. Thus, the problem of detecting colinear points can be converted to the problem of finding concurrent curves. Implementation Hough transform algorithm uses an array called accumulator to detect the existence of a line y = mx + b. The dimension of the accumulator is equal to the number of unknown parameters of Hough transform problem. For example, the Hough linear transform problem has two unknown parameters: m and b. The two demension of the accumulator array would correspond to quantized values for m and b. For each pixel and its neighbhood, Hough transform algorithm determines if there is enough evidence of an edge at that pixel. If so, it will caculate the paramaters of that line, and then look for the accumulator's bin that the parameters fall into, and increase the value of that bin. By finding the bins with the highest value, the most likely lines can be extracted, and their (approximate) geometric definitions read off. The simplest way of finding these peaks is by applying some form of threshold, but different techniques may yield better results in different circumstances - determining which lines are found as well as how many. Since the lines returned do not contain any length information, it is often next necessary to find which parts of the image match up with which lines. Example Consider three data points, shown here as black dots.

For each data point, a number of lines are plotted going through it, all at different angles. These are shown here as solid lines. For each solid line a line is plotted which is

perpendicular to it and which intersects the origin. These are shown as dashed lines.

The length and angle of each dashed line is measured. In the diagram above, the results are shown in tables. This is repeated for each data point. A graph of length against angle, known as a Hough space graph, is then created.

The point where the lines intersect gives a distance and angle. This distance and angle indicate the line which bisects the points being tested. In the graph shown the lines intersect at the purple point; this corresponds to the solid purple line in the diagrams above, which bisects the three points. The following is a different example showing the results of a Hough transform on a raster image containing two thick lines.

The results of this transform were stored in a matrix. Cell value represents the number of curves through any point. Higher cell values are rendered brighter. The two distinctly bright spots are the intersections of the curves of the two lines. From these spots' positions, angle and distance from image center of the two lines in the input image can be determined. Variations and extensions Using the gradient direction to reduce the number of votes An improvement suggested by O'Gorman and Clowes can be used to detect lines if one takes into account that the local gradient of the image intensity will necessarily be orthogonal to the edge. Since edge detection generally involves computing the intensity gradient magnitude, the gradient direction is often found as a side effect. If a given point of coordinates (x,y) happens to indeed be on a line, then the local

direction of the gradient gives the parameter corresponding to said line, and the r parameter is then immediately obtained. In fact, the real gradient direction is only estimated with a given amount of accuracy (approximately 20), which means that the sinusoid must be traced around the estimated angle, 20. This however reduces the computation time and has the interesting effect of reducing the number of useless votes, thus enhancing the visibility of the spikes corresponding to real lines in the image. Hough transform of curves, and Generalised Hough transform Although the version of the transform described above applies only to finding straight lines, a similar transform can be used for finding any shape which can be represented by a set of parameters. A circle, for instance, can be transformed into a set of three parameters, representing its center and radius, so that the Hough space becomes three dimensional. Arbitrary ellipses and curves can also be found this way, as can any shape easily expressed as a set of parameters. For more complicated shapes, the Generalised Hough transform is used, which allows a feature to vote for a particular position, orientation and/or scaling of the shape using a predefined lookup table. Using weighted features One common variation detail. That is, finding the bins with the highest count in one stage can be used to constrain the range of values searched in the next. Limitations The Hough Transform is only efficient if a high number of votes fall in the right bin, so that the bin can be easily detected amid the background noise. This means that the bin must not be too small, or else some votes will fall in the neighboring bins, thus reducing the visibility of the main bin.

Also, when the number of parameters is large (that is, when we are using the Generalised Hough Transform with typically more than three parameters), the average number of votes cast in a single bin is very low, and those bins indeed corresponding to a figure in the image not necessarily appear to have a much higher number of votes than the neighbors. Thus, the Generalised Hough Transform must be used with great care to detect anything other than lines or circles. Finally, much of the efficiency of the Hough Transform is dependent on the quality of the input data: the edges must be detected well for the Hough Transform to be efficient. Use of the Hough Transform on noisy images is a very delicate matter and generally, a denoising stage must be used before. In the case where the image is corrupted by speckle, as is the case in radar images, the Radon transform is sometimes preferred to detect lines, since it has the nice effect of attenuating the noise through summation.

What is eye tracking? (References)


Is there an easier way for the disabled to communicate? How does a 6-month-old baby perceive the world? Where is the most effective ad space on a website? Eye tracking can be used to find answers to questions like these, as well as many others by measuring a persons point of gaze (i.e. where they are looking) and determining eye/head position. The origins of eye tracking are over a century old, but in the last 5 years large technological advances have opened up new possibilities. Modern day eye tracking can be used not only in a laboratory, but in homes, schools, and businesses where it aids in research and analysis and is used for interacting with computers as well as with friends and family. Simple Idea, Complex Math

Eye tracking works by reflecting invisible infrared light onto an eye, recording the reflection pattern with a sensor system, and then calculating the exact point of gaze using a geometrical model. Once the point of gaze is determined, it can be visualized and shown on a computer monitor. The point of gaze can also be used to control and interface with different machines. This technique is referred to as eye control. Improving the experience The main challenges of eye tracking are not only in developing the right algorithms and sensor solutions, which are a prerequisite for a high level of accuracy, but also in the way users interact with a specific eye tracking device. Eye trackers should be able to perform with all types of eyes and account for such things as glasses, contact lenses, head movement and light conditions. Users should also be able to save personal settings and even look away from the eye tracker without needing to recalibrate. Until recently, different types of eyes required different methods of eye tracking. Dark pupil tracking worked better for people with dark eyes and bright pupil tracking worked better for children and people with blue eyes. Recently, both of these techniques have been combined to eliminate the need for two separate eye trackers. Another important aspect in eye tracking is the track box. This is the imaginary box in which a user can move his/her head and still be tracked by the device. With a larger track box, the user will have more freedom of movement and experience greater comfort.

Multiple Applications With the right idea there is no limit to the applications of eye tracking. Currently, some of the major uses for analysis are academic research e.g. cognitive science, psychology and medical research; market research and usability studies, such as evaluations of advertising or package design and software or web usability. Eye tracking techniques can also be used for interaction - people can control a computer and make things happen by just looking at it. Eye control can be used as sole interaction technique or combined with keyboard, mouse, physical buttons and voice. Eye control is used in communication devices for disabled persons and in various industrial and medical applications. Future Value The crude, complex, and highly intrusive eye tracking techniques of the past have been replaced by refined and user-friendly methods that are producing valuable results today and paving the way for the future. Eye tracking and eye control have a limitless future. Areas like personal computing, the automotive industry, medical research, and education will soon be utilizing eye tracking in ways never thought possible.

Eye Tracking technology

Tobiis eye tracking technology utilizes advanced image processing of a persons face, eyes and reflections in the eyes of near-infrared reference lights to accurately estimate:

the 3D position in space of each eye the precise target to which each eye gaze is directed towards

Key advantages Tobii has taken eye tracking technology a significant step forward through a number of key innovations that enable large market applications. Key advantages of Tobiis eye tracking technology are:

Fully automatic eye tracking High tracking accuracy Ability to track nearly all people Completely non-intrusive Good tolerance of head-motion

Patented techniques Compared to other technologies, a number of innovations have been made to overcome traditional problems associated with eye tracking, such as cumbersome equipment, poor tracking precision and limited tolerance to head motion. Some of the key aspects of Tobiis technology include:

Patented techniques to use fixed wide field of view optics in combination with high resolution sensors Patented techniques for accurate estimation of the 3D position in space for both eyes

Sophisticated image processing and patented control logic to allow for 100% automatic tracking, and high tracking ability; tracks almost everyone, even those with glasses

Advanced algorithms to compensate for head motion without loss in accuracy Unique techniques to enable long-lasting calibrations

Application technology Tobii conducts research and development into eye tracking applications. We have developed an extensive toolbox of software that allows us to rapidly create eye control applications and eye gaze analysis applications.

Tobiis eye-based interaction technology includes the Tobii eye control engine, a powerful ActiveX-based API for rapid creation of eye control applications in the Windows environment. This allows our customers and partners to quickly develop and customize applications to utilize eye gaze as a modality in computer interfaces. This is not yet on the market, but is available to key partners on a projectbasis.

Potrebbero piacerti anche