Sei sulla pagina 1di 26

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

ABSTRACT
We propose a method to remove moving objects from an invehicle camera image sequence by fusing multiple image sequences. Driver assistance systems and services such as Google street view require images containing no moving objects. The proposed scheme consists of three parts: (i) collection of many images sequences along the same route by using vehicles equipped with an omni-directional camera, (ii) temporal and spatial registration of image sequences, and (iii) mosaicing partial images containing no moving objects. Experimental results show that 97.3% of the moving object area could be removed by the proposed method.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

TABLE OF CONTENTS
CHAPTER
ACKNOWLEDGMENT ABSTRACT TABLE OF CONTENTS LIST OF FIGURES 1. INTRODUCTION 2. OMNI DIRECTIONAL CAMERA 3. REMOVAL METHOD OF MOVING OBJECT 3.1 Over view SEQUENCES 4. 1 Temporal registration 4. 1a. Dynamic time warping 4.2 Spatial registration 4.2a Non-rigid registration 5. SELECTING AND MOSAICING OF PARTIAL IMAGES 6. EXPRIMENT 7. PERSONAL CONTRIBUTION & VIEW 8. CONCLUSION 9. .REFERENCES 07 08 09 10 12 15 19 20 21

PAGE No.
i ii iii iv 01 03 06 06

4. TEMPORAL AND SPATIAL REGISTRATION BETWEEN IMAGE 07

DEPT OF AEI

IESCE

ii

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

LIST OF FIGURES
No. NAME PAGE No.

1. Removal of moving objects by the proposed method. 2. Omni-directional camera. 3. Image formation in Omni-directional camera. 4. Selection of background partial image 5. All source image sequences are registered to the target image sequence. 6. DTW applied for aligning image sequences in the time direction. 7. MATLAB pixel value 8. Vector median filter. 9. Example of registration result. 10. Result of the proposed method. 11. Removal rate versus number of image sequences. .

02 04 05 06 07 07 13 14 16 17 18

DEPT OF AEI

IESCE

iii

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

1.

INTRODUCTION

In recent years, street-view images are widely used in many applications such as driver assistance systems, ego-localization and forward obstacle detection. In order to realize these systems, street-view images containing no moving object are required. On the other hand, Google Street View exhibits street-view images on the Internet. However, there is a problem that our privacies may be violated in these images, e.g. faces, running vehicles or bicycles. Although automatic detection and blurring of them are applied, sufficient quality is not achieved in the current system. Therefore, removal of moving objects from these images is one of the solutions .To remove obstacles from an image, there are three major approaches: (1) from an image, (2) from an image sequence, and (3) from multiple images captured independently. Most of them require obstacle areas to be specified manually or detected precisely. However, it is time consuming to specify obstacle areas manually. Meanwhile, it is also very difficult to detect those areas automatically due to a large variety of targets in an urban area. In contrast, the work presented in [7] can remove obstacles without specifying them by using multiple images. However, rough camera positions of these images should be input manually. Therefore, this method cannot be applied for a large number of images. This paper proposes a method to remove moving objects from an in-vehicle camera image sequence without any manual interaction by using multiple image sequences.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

To obtain an Omni-directional image containing no moving object, the following two problems are solved in this paper. Difference of camera positions Selection of background-like partial images from multiple image sequences. The difference of camera position occurs due to the different speed and the different lateral position of vehicles. This difference causes an appearance change. To deal with this problem, the proposed method utilizes registration of image sequences in both time and space direction. As for the second problem, we assume that the occurrence of moving objects is relatively few at a same sub-window in images captured at a same place when selecting the most background-like partial image.

Fig.1 Removal of moving object by the proposed method

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

2. OMNI-DIRECTIONAL CAMERA
Recent years have witnessed the increasing popularity of studies using 360 images captured with omni directional cameras. The authors have already created a database of building images using an omni directional camera. Omni directional cameras can capture 360 images in a single shot, but their optical characteristics are different from those of ordinary cameras, and their use presents several problems. Two problems in particular pertain to camera calibration and the low resolution obtained when capturing a single360 image. Numerous studies have been conducted in regard to the former, but only a few in regard to the latter due to the small number of instances in which omni directional camera images have been used. This has come to be regarded as an especially significant problem, as the number of opportunities for omni directional cameras to actually be used has been increasing of late. Resolutionenhancement techniques can be thought of as being broadly divided into hardware methods, in which a high-resolution CCD or the like is employed; and software methods, in which super-resolution or the like is performed using a series of multiple images. The software control modified the circular image into panoramic display. Images that have been examined with conventional super-resolution techniques have hitherto been almost exclusively taken with static cameras, whereas we have adopted a novel approach in using images captured with a moving camera.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

1) Camera 2) Upper mirror 3) Lower mirror 4) Black spot 5) Field of view

Fig.2 Omni directional camera

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.3.image formation A camera normally has a field of view that ranges from a few degrees to, at most, 180. This means that it captures, at most, light falling onto the camera focal point through a semi-sphere. In contrast, an ideal Omni-directional camera captures light from all directions falling onto the focal point, covering a full sphere. In practice, however, most Omni-directional cameras cover only almost the full sphere and many cameras which are referred to as Omni-directional cover only approximately a semisphere, or the full 360 along the equator of the sphere but excluding the top and bottom of the sphere. In the case that they cover the full sphere, the captured light rays do not intersect exactly in a single focal point.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

3. REMOVAL METHOD OF MOVING OBJECT


3.1. Overview
This paper defines a moving object as an unfixed object on the road such as a vehicle, a bicycle, a motorcycle, or a pedestrian. The proposed method consists of three steps: (i) collecting image sequences, (ii) temporal and spatial registration, and (iii) mosaicing partial images containing no moving object N image sequences are obtained by driving vehicles equipped with an omni directional camera along the same route many times. Then temporal and spatial registrations are applied for compensating for the difference of the camera position. After these registrations are applied, we can obtain images captured at almost the same position. Finally, by assuming the occurrence of moving objects is relatively few at a same sub-window, the partial images of the background are selected and mosaiced to obtain an omnidirectional camera image having no moving object (Fig. 2). The details of steps (ii) and (iii) are explained below.

Fig.4 no moving object in the captured image in a different time.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

4. TEMPORAL AND SPATIAL REGISTRATIONS BETWEEM IMAGE SEQUENCES


A total of N image sequences are used in the proposed method: one target image sequence and N 1 source image sequences. As shown in Fig. 3, all source image sequences are registered to the target image sequence in the registration step.

Fig.5 all source image sequences are registered to the target image seq Fig.5 all source image sequences are registered to the target image sequence.

4.1 Temporal registration: Aligns all image sequences


along the time direction. A same frame index does not correspond to the same location since it is difficult to capture images by keeping the same driving speed. Therefore, Dynamic Time Warping (DTW) is applied to solve the non-linear frame alignment problem (fig.6).

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.6 DTW is applied for aligning image sequences in the time direction.

4.1a: Dynamic time warping


Dynamic time warping (DTW) is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics indeed, any data which can be turned into a linear representation can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds. In general, DTW is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in the context of hidden Markov models. One example of the restrictions imposed on the matching of the sequences is on the monotonicity of the mapping in the time dimension. Continuity is less important in DTW than in other pattern matching algorithms; DTW is an algorithm particularly suited to matching sequences with missing information, provided there are long enough segments for matching to occur. The extension of the problem for two-dimensional "series" like images (planar warping) is NP-complete, while the problem for one-dimensional signals like time series can be solved in polynomial time.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

4.2 Spatial registration


Spatial registration: Is applied to image sequences aligned by DTW. Since the lateral position of the vehicle may be different and the frame rate of the camera is limited, positional error still exists even if the temporal registration is applied. In order to reduce the small difference of the camera position, registration along space direction should be performed. The different camera positions cause a non-linear distortion to captured images due to complex structures in a scene. In addition, pixel-wise correspondence is required for mosaicing partial images precisely. Therefore, the proposed method approximates appearance variations by using B-spline, and non-rigid registration (NRR) is applied to the image sequences.

DEPT OF AEI

IESCE

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

4.2a: Non-rigid registration


Image registration is the process of determining correspondence between all points in two images of the same scene. Image analysis applications that involve two or more images of a scene often require registration of the images. nonlinear geometric differences. Image registration has a long history. One of the first examples of image registration appeared in the work of Roberts . By aligning projections of edges of polyhedral solids with image edges, he was able to locate and recognize predefined polyhedral objects in images. Registration of entire images first appeared in the remote sensing literature. Anuta and Barnea and Silverman developed automatic methods for registering satellite images using the sum of absolute differences as the similarity measure. Leese et al. and Pratt did the same using cross-correlation coefficient as the similarity measure. Use of image registration in computation of depth was initially pursued by Julesz , and then by Bakis and Langley , Mori et al. , Levine et al. , and Nevatia . Image registration found its way to biomedical image analysis as data from various scanners that measure anatomy and function became digitally available . Fischler an Elschlager were among the first to use Non-rigid registration to locate deformable objects such as human faces in images. Burr later recognized handwritten characters by non-rigid registration. Bajcsy and Broit developed a non-rigid registration method that could align deformed images in their entirety. In medical imaging, non-rigid registration was initially used to standardize MR and CT brain images with respect to an atlas . Most non-rigid image registration methods are iterative and minimize a cost or an energy function, defined in terms of the geometric and/or intensity difference between images. A smaller number of methods are based
DEPT OF AEI IESCE

Non rigid image

registration refers to a class of methods where the images to be registered have

10

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

on matched feature points and use nonlinear transformation functions to align the images.

The paper by Cachier et al. in this issue classifies various non rigid methods can be found in papers by Gerlot and bizais.

image

registration methods. Further surveys and classifications of image registration

Most work on non rigid registration has used medical images, and in particular brain images. The brain is of tremendous interest because of many applications in neuroscience and neurosurgery, presenting many unique challenges. Non-rigid registration of models and atlases , measurement of change within an individual, and determination of location with respect to a pre acquired image during stereotactic surgery . The detailed non[rigid registration and comparison of brain images requires the determination of correspondence throughout the brain and the transformation of one image space with respect to another according to the correspondences. There are three types of deformation which need to be accounted for in non[rigid brain image registration: 1) change within an individuals brain due to growth, surgery, or disease; 2) differences between individuals; and 3) warping due to image distortion, such as in echo-planar magnetic resonance imaging. Deformations of type 1 represent an individuals brain changes during development, surgery, or degenerative process such as Alzheimers disease, multiple sclerosis, or malignant disease. In the cases of growth and degenerative disease, the deformation is incremental and likely to be representable in terms of relatively small and smooth transformations the brain is a difficult task but has many important applications including comparison of shape and function between individuals or groups , development of probabilistic. .

DEPT OF AEI

IESCE

11

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

5. SELECTING AND MOSICING OF PARTIAL IMAGES


First, many sub-windows are positioned on the registered images. Here, the size of each sub-window is W W pixels and the sub-windows are slightly overlapped with each other. Then a partial image corresponding a sub-window is treated as a 3W2-dimensional vector containing RGB pixel values. Next, the most background-like vector is selected by using the vector median filter [10]. The vector median filter is a median filter extended so that a multiple dimensional vectors could be input. It tends to exclude outliers and select the most common one. Under the assumption that the occurrence of moving objects is relatively few at a same subwindow in the images captured at a same place, a background image tends to be selected instead of a moving object image (Fig. 5). Using M input vectors v1, v2, . . . , vM, an output of the vector median filter is calculated as

where | | is L2 norm of a vector. Finally, the selected images of all sub-window positions are mosaiced. Alpha blending is applied at the overlapped area of the partial images.

DEPT OF AEI

IESCE

12

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.7 MATLAB pixel value

DEPT OF AEI

IESCE

13

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

DEPT OF AEI

IESCE

14

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.8 vector median filter tends to select a background image instead of a moving object image

6. EXPERIMENT
We performed an experiment to evaluate the effectiveness of the proposed method. Point Grey Research Ladybug2 was used as an omni directional camera. The frame rate was 15 fps and the original panorama image size was 1,024 512 pixels. One image sequence was used as a target image sequence and 2 to 14 image sequences were used as source image sequences. Each sequence contained about 300 frames. The window size for vector median filter was 30 30 pixels. Result of the registration is shown in Fig. 6. The left image shows the result of DTW and the right one shows the result of DTW + NRR. Fig.(7) shows the result of the removal of the moving objects. Although a pedestrian, vehicles and a bicycle are observed in the input image (target image), they were successfully removed in the result. Effectiveness of the proposed method was evaluated by the removal rate of the moving objects shown in Fig. (8). The removal rate was calculated by (1 B/A), where A is the number of pixels corresponding to moving objects in the target image and B is the number of pixels corresponding to moving objects in the output image. This was evaluated by using 11 target image frames selected randomly. In order to avoid the dependency of the selection of the source image sequence, all combinations were examined, and the averages of
DEPT OF AEI IESCE

15

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

them were used for evaluation. In the case of using 15 image sequences, 97.3% of the pixels compositing the moving objects were successfully removed. Fig. 8 shows that use of many image sequences improves the removal of moving objects. This is because partial image selection by the vector median filter is sensitive to illumination change in a small number of image sequences.

DEPT OF AEI

IESCE

16

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.9 Example of registration result the target image and the source image are placed as a checker board.

Input image

Output image
DEPT OF AEI IESCE

17

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Figure.10 Result of the proposed method. Although a pedestrian, vehicles and a bicycle are observed in the input image and they were removed in the output image.

DEPT OF AEI

IESCE

18

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

Fig.11 Removal rate of moving object when changing the number of image sequences.

PERSONAL CONTRIBUTION AND VIEW


Removal of moving object from a street view image by fusing multiple image sequences is widely used in Driver assistance systems and Google street view- services require images containing no moving objects. Using this method we can eliminate the problem due to the presence of moving objects and also we can keep our privacy. When applying this method 97.3% of the moving object area could be removed. Future work includes the improvement of the removal using a smaller number of sequences.

DEPT OF AEI

IESCE

19

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

CONCLUSION
We proposed a method to remove moving objects from an in-vehicle camera image sequence by fusing multiple image sequences at a same location taken in a different timing independently. First, temporal and spatial registrations are applied to compensate for the difference of camera positions. Then image sequence having no moving object is obtained by selection and mosaicing of partial background images obtained from different image sequences. The proposed method removed moving objects accurately with a high rate of 97.3 %. Future work includes the improvement of the removal using a smaller number of sequences.

DEPT OF AEI

IESCE

20

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

REFERENCES

[1]: In-vehicle [2]:

H. Uchiyama, D. Deguchi, T. Takahashi, I. Ide, and H. Cameras, Proc. 2009 IEEE Intelligent Vehicles

Murase, Ego localization using Streetscape Image Sequences from Symposium, pp.185190, Jun. 2009. M. Jabbour, V. Cherfaoui, and P. Bonnifait, Management of Landmarks in a GIS for an Enhanced Localisation in Urban Areas, Proc. 2006 IEEE Intelligent Vehicles Symposium, pp.5057, Sep. 2006. [3]: Google Maps, http://maps.google.com/
IESCE

DEPT OF AEI

21

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

[4]:

A. Levin, A. Zomet, and Y. Weiss, Learning How to In paint

from Global Image Statistics, Proc. IEEE 9th Int. Conf. on Computer Vision, pp.305312, Oct. 2003. [5]: Y. Wexler, E Shechtman, and M. Irani, Space-Time Video Completion, Proc. 2004 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Vol.1, pp.120127, Jun. 2004. [6]: A. Yamashita, I. Fukuchi, T. Kaneko, and K. Miura, Removal of Adherent Noises from Image Sequences by Spatio-Temporal Image Processing, Proc. 2008 IEEE Int. Conf. on Robotics and Automation, pp.23862391, May 2008. [7]: J. Bohm, Multi-image Fusion for Occlusion-free Facade Texturing, Int. Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences,Vol.35, Part B5, pp.867872, Jul. 2004. [8]: J. Sato, T. Takahashi, I. Ide, and H. Murase, Change Detection in Streetscapes from GPS Coordinated Omni- Directional Image Sequences, Proc. 18th Int. Conf. on Pattern Recognition, Vol.4, pp.935938, Aug. 2006.

[9]: and

D. Rueckert, L.I. Sonoda, C. Hayes, D.L.G. Hill, M.O. Leach, D.J. Hawkes, Nonrigid Registration Using Free-form

Deformations: Application to Breast MR Images, IEEE Trans. Medical Images, Vol.18, No.8, Pp.712721, Aug. 1999. [10]: J. Astola, P. Haavisto, and Y. Neuvo, Vector Median Filters, Proc. IEEE, Vol.78, No.4, pp.678689, Apr. 1990.

DEPT OF AEI

IESCE

22

SEMINAR REPORT 2011

REMOVAL OF MOVING OBJECT FROM A STREET-VIEW IMAGE BY FUSING MU LTIPLE IMAGE SEQUENCES

DEPT OF AEI

IESCE

23

Potrebbero piacerti anche