Sei sulla pagina 1di 8

ON THE NON-LINEAR STATISTICS OF RANGE IMAGE PATCHES

HENRY ADAMS AND GUNNAR CARLSSON

Abstract. In The nonlinear statistics of high-contrast patches in natural images, A. Lee, K. Pedersen, and D. Mumford study the distributions of 3 3 patches from optical images and from range images. G. Carlsson, T. Ishkanov, V. de Silva, and A. Zomorodian in On the local behavior of spaces of natural images apply computational topological tools to the dataset of optical patches studied by Lee et al. and nd geometric structures for high density subsets. One high density subset is called the primary circle, and essentially consists of patches with a line separating a light and a dark region. In this paper, we apply the techniques of Carlsson et al. to range patches. By enlarging to 5 5 and 7 7 patches, we nd core subsets that have the topology of the primary circle, suggesting a stronger connection between optical patches and range patches than was found by Lee et al.

1. Introduction An optical image has a grayscale value at each pixel, whereas a range image pixel contains a distance: the distance between the laser scanner and the nearest object in the correct direction. In either case, we can think of an n m-pixel patch as a vector in Rnm and of a set of patches as a set of points in Rnm . Below are two range images from the Brown database by Lee and Huang which contains a variety of indoor and outdoor shots. Darker regions are closer than lighter regions, except for out of range data such as the sky that is colored black.

Figure 1. Sample range images In [7], A. Lee, K. Pedersen, and D. Mumford describe the distributions of optical and range patches. They begin their analysis by constructing data sets of high
Key words and phrases. range images, topology, persistent homology. Research supported in part by DARPA HR 0011-05-1-0007 and NSF DMS 0354543.
1

HENRY ADAMS AND GUNNAR CARLSSON

contrast patches, where high contrast is dened by threshholding a natural measure of contrast given in Section 2. They nd that high contrast 3 3 range patches are densely clustered around the binary patches. A pixel in a binary range patch is one of two values: foreground or background. For optical patches, Lee et al. nd a strikingly dierent distribution: the majority of the high contrast patches lie near a 2-dimensional annulus. Each patch on this manifold is a linear step edge, a few of which are shown in Figure 2. The annulus is parameterized by the angle of the edge and by the distance of the edge from the center of the patch.

Figure 2. Step-edge annulus

Figure 3. Primary circle

In [3], Carlsson, Ishkanov, de Silva, and Zomorodian expand upon the optical image ndings of [7]. They use persistent homology, which we introduce in Section 3, to identify the topologies of high density subsets of the optical patch space. According to one choice of density estimator, the 30% densest points have the topology of a circle, called the primary circle, homotopy equivalent to the step-edge annulus and shown in Figure 3. When using a dierent (ner) density estimator, the collection of densest points have the topology of a Klein bottle that contains the primary circle. In this paper we apply the topological methods of [3] to study range patches. One conclusion of the work in [7] was that range and optical 3 3 patches were essentially dierent. The range patches simply broke up into clusters without an obvious simple geometry, while the optical patches were organized in a clearly geometric way. Our hypothesis concerning this conclusion is that it is due to the clustering of the range patches around binary patches: after normalizing the contrast, essentially only two values are taken in range patches, whereas up to nine values are taken in optical patches. The 3 3 binary patches are simply too coarse grained to encode an analogue of the primary annulus or circle found in [3] and [7], and so one cannot expect to nd any reasonable geometry. This suggests the possibility that on larger scales, which have a greater capacity to encode, the range patches might show this primary circle behavior. This is indeed what we nd, for when we study 5 5 and 7 7 range patches, an analogue to the primary circle present in optical patches occurs. The methods are analogous to those used in [3] to study optical patches. 2. Preparing the spaces of range patches We analyze 3 3, 5 5, and 7 7 pixel patches from the Brown database by Lee and Huang, a set of about 200 range images. The operational range for the Brown scanner is typically 2-200 meters, and the distance values for each pixel are stored

NON-LINEAR STATISTICS OF RANGE IMAGE PATCHES

in units of 0.008 meters. More details about the Brown database can be found at the following webpage: http://www.dam.brown.edu/ptg/brid/index.html. We obtain the spaces of patches through the following steps, which are nearly identical to the procedures used in [3] and [7]. Step 1: We randomly select nearly 4 105 size m m patches from the images in the database, where m is 3, 5, or 7. Step 2: Regarding each m m patch as a m2 -dimensional vector, we take the logarithm of each coordinate. This step is motivated in [6] for providing shape invariance. Step 3: We compute the D-norm, x D , of each vector. This is a measure of the contrast of a patch. Two coordinates of x are neighbors, denoted i j, if the corresponding pixels in the m m patch are adjacent. We calculate the D-norm for a vector by summing the squared dierence between all neighboring coordinates 2 and then taking the square root: x D = ij (xi xj ) . Step 4: We select the patches that have a D-norm in the top T percent of the entire sample. The rationale is that high contrast patches are believed to contain the most important information of an image but to follow a dierent distribution than low-contrast patches. We use T = 20%, as done in [3] and [7]. Step 5: For computational feasibility, we randomly select 50,000 of the above patches in the top T percent. Step 6: So that images of distant and close objects are comparable, we subtract from each vector the average of its coordinates. Step 7: We map the space into a unit sphere by dividing each vector by its Euclidean norm, which is nonzero because the patches are high contrast. We do not change to the DCT basis, as done in [7] for convenience. Step 8: From the rst seven steps, we have set of 50,000 high contrast, normalized, range image patches. However, hoping to approximate the topology of such a space is still a daunting task: the outlier points may signicantly alter the computed topology. A more modest goal is to describe the topology of core subsets of the space, in the hope that the core subsets will reect important patterns of the entire space. We estimate the density at a point using the function k , where k (x) is dened to be the distance from point x to the k-th nearest neighbor of x. A small choice of the parameter k produces a very local density estimate whereas a larger k-value gives a more global estimate. We select the points whose densities are in the top p percent. The core subset, which depends on the patch size m m, the density parameter k, and the density cut p, shall be denoted X m (k, p). Here we consider core subsets with density parameter k = 300 and cut percentage p = 30%. 3. Persistent Homology Our core subsets are nite samplings from an unknown underlying space. Persistent homology is a method that uses only a nite sampling to estimate the underlying spaces topology. We present a simplied example. Suppose we are given 10 points which (unknown to us) are sampled from a circle. Can we identify the underlying space? Recall that the k th Betti number of a space is a topological invariant, equal to the rank of the k th homology group and roughly equal to the number of k-dimensional holes. For a circle, Betti0 = Betti1 = 1.

HENRY ADAMS AND GUNNAR CARLSSON

We build a family of Rips simplicial complexes, nested by the parameter R as follows. The vertices are our 10 given points. The included higher simplexes are spanned by any set of vertices within distance R of each other. Figure 4 shows three such nested complexes, with R increasing from left to right.

Figure 4. Nested simplicial complexes built on 10 circle points Clearly the choice of R is an important one, which we do not know how to make without knowledge of the underlying space. However, using the ideas of [8] and the computer software package PLEX, created by G. Carlsson, V. de Silva, A. Zomordian, and P. Perry, we compute the Betti numbers over a range of R-values and display the result in a Betti barcode.
10 8 Betti0 6 4 2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Betti1

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

Figure 5. Betti barcode for the 10 circle points The R-values are on the horizontal axis, and Bettik is the number of intervals in the Bettik plot that intersect the vertical line through R. The Betti0 plot shows the 10 vertices joining into one connected component as R increases. The single interval in the Betti1 plot begins when the loop forms and ends when the loop lls to a disk. For a large range of R-values, the correct circular prole Betti0 = Betti1 = 1 is obtained. The idea of persistent homology is that long intervals in the Bettik plot generally correspond to real topological features of the underlying space, whereas short intervals are considered to be noise. In practice we do not use the Rips complex: if every point in a large nite sampling is used as a vertex, the computations become infeasible. Instead we use the weak witness complex dened in [2]. We choose 50 vertices (called landmark

NON-LINEAR STATISTICS OF RANGE IMAGE PATCHES

points) using sequential maxmin, and the remaining data points serve as witnesses for inclusions of higher simplexes. This is the same topological machinery used in [3]. The weak witness complex depends on a parameter v {0, 1, 2} which we choose to be v = 1: De Silva and Carlsson in [2] nd v = 0 to be generally less eective, and v = 2 has the disadvantage of connecting every landmark point to at least one other at R = 0, which is problematic for our data set when a cluster of patches contains only one landmark. 4. Results Many of our persistent homology results for a core subset X m (300, 30) can be illuminated by the following projection. We change X m (300, 30) to the discrete cosine transform basis for m m patches, with DCT basis vectors normalized as in Steps 6 and 7 of Section 2. Two of the DCT basis vectors are horizontal and vertical linear gradients, and we project onto these two coordinates (the x and y axes respectively in Figures 8, 12, and 13).

Figure 6. Horizontal and vertical linear gradient DCT basis vectors for the 5 5 patch case There are several long intervals in the Betti0 barcode plot for the core subset X 3 (300, 30), Figure 7, evidence of disjoint clusters. With one exception, these clusters are centered on binary approximations of linear step edges, a few of which are shown in the bottom row of Figure 9. The exceptional cluster, on the far right in Figure 8, centers on the horizontal linear gradient. This is consistent with the observation that many range patches are shots of the ground. Why do range images cluster around binary patches while optical images are more continuously distributed? Perhaps objects in an indoor or outdoor scene are more likely to be a constant distance from the camera than to be monochromatic, and so an edge between two objects is more likely to produce a binary range patch than a binary optical patch. A second possible explanation lies in the subresolution properties of range scanners and optical cameras, described by Lee et al. in [7]. Range scanners record sub-pixel detail in a single pixel value by selecting the minimum; digital cameras take an average. A patch with sub-pixel binary values will remain binary after pixel values are chosen via minimums, but may not if pixels are chosen via averages. The homology cycle producing the longest interval in the Betti1 barcode of Figure 7 is the primary circle of Figure 3. We see pieces of this primary circle in Figure 8. However, the shorter Betti1 intervals cloud the picture as they are not robust and vary with a dierent selection of 50 landmark points. The topology of the core subset X 3 (300, 30) is not clear. Range images tend to cluster near binary patches, and there are relatively few 3 3 binary patches. Therefore we consider 5 5 and 7 7 patches in hope of nding a manifold model. In Figures 10 and 11 are sample PLEX Betti barcode plots for the core subsets X 5 (300, 30) and X 7 (300, 30).

HENRY ADAMS AND GUNNAR CARLSSON


1
50

0.8
40 Betti0 30 20 10 0

0.6 0.4 0.2


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0 !0.2 !0.4

4 Betti1

!0.6
2

!0.8 !1 !1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

!0.5

0.5

Figure 7. Barcodes for X 3 (300, 30)

Figure 8. Projection of X 3 (300, 30) onto linear gradients

Figure 9. Linear step edges in the top row and their binary approximations beneath
50 40 Betti0

50 40 Betti0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

30 20 10 0

30 20 10 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

25 20 Betti1

30

10 5 0

Betti1 10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

15

20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Figure 10. Barcodes for X 5 (300, 30)

Figure 11. Barcodes for X 7 (300, 30)

Both the X 5 (300, 30) plot and the X 7 (300, 30) plot contain a single long Betti0 interval and a single long Betti1 interval, evidence of circular topology. The homology cycle producing the Betti1 interval is the primary circle, visible in Figures 12 and 13. We ran 25 trials each on X 5 (300, 30) and X 7 (300, 30), selecting dierent landmark points. We nd the results to be robust: in each trial, the circular prole Betti0 = Betti1 = 1 is obtained for a range of R values of length greater than 0.25, and no other Betti plot interval has length greater than 0.05.

NON-LINEAR STATISTICS OF RANGE IMAGE PATCHES


1 0.8 0.6 0.4 0.2 0 !0.2 !0.4 !0.6 !0.8 !1 !1

1 0.8 0.6 0.4 0.2 0 !0.2 !0.4 !0.6 !0.8 !1 !1

!0.5

0.5

!0.5

0.5

Figure 12. Projection of X 5 (300, 30) onto linear gradients

Figure 13. Projection of X 7 (300, 30) onto linear gradients

5. Application to compression The understanding of the statistical distribution of range image patches can be used to devise methods of compression of range patch data. We rst demonstrate how one can use the intuition that a given patch is well approximated by one in which the range values take one of only two values, foreground or background. Rather than storing all the continuous values of the range function, one should recoordinatize pixel values as follows. Compute the mean value of the range function over the patch. Compute the standard deviation =
i (xi

)2 n

where there are n pixels in the patch and and where the xi s are the range values occurring in the patch. The values are now a useful approximation to the two values taken by the range function. Each patch will be stored as the quadruple (, , {i }, {i }). The mean is a real number, the standard deviation is a non-negative real number, the residue vector {i } has one entry for each pixel, and the direction vector {i } has coordinates in {0, 1}. Given a patch with values xi , compute values i and i as follows. For each i, determine if xi is greater than or less than . If xi < (>) , then we set i = 1 (+1). If xi = , we set i = 0. Finally, we set i = xi ( + i ). It is clear how to decompress the patch given the quadruple (, , {i }, {i }), and the information about the values of the range function means that the values i will tend to be small in absolute value. If we are dealing with 5 5 patches or larger, since the density analysis is consistent with the optical result that the most frequently occurring high contrast patches are linear patches, one should then be able to perform a further compression along the lines of the wedgelet method of D. Donoho [4].

HENRY ADAMS AND GUNNAR CARLSSON

6. Conclusions The primary circle is a good model for core subsets of 5 5 and 7 7 range image patches, providing evidence that the behavior of optical patches and larger scale range patches may really be quite similar. This suggests that were one to attempt to develop sophisticated compression schemes for range images, methods such as wedgelets [4] could indeed be extended to the case of range images. However, the binary bias of the range patches suggests that wavelet based schemes for the compression might use dierent mother wavelets for the encoding. Further investigation may determine if range patches also tend to lie along other portions of Carlsson et al.s Klein bottle model for optical patches. References
[1] M.A. ARMSTRONG, Basic Topology, Springer (1983). [2] G. CARLSSON, V. DE SILVA, Topological estimation using witness complexes, Symposium on Point-Based Graphics (2004). [3] G. CARLSSON, T. ISHKHANOV, V. DE SILVA, A. ZOMORODIAN, On the local behavior of spaces of natural images, Internat. J. Computer Vision 76, 1 (2008), pp.1-12. [4] D. DONOHO, Wedgelets: nearly minimax estimation of edges, Ann. Stat. 27, 3 (1999), pp. 859-897. [5] A. HATCHER, Algebraic topology, Cambridge University Press (2001). [6] J. HUANG, A.B. LEE, D. MUMFORD, Statistics of range images, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2000), pp. 324-332. [7] A.B. LEE, K.S. PEDERSEN, D. MUMFORD, The nonlinear statistics of high-contrast patches in natural images, Internat. J. Computer Vision 54, 1-3 (2003), pp. 83-103. [8] A. ZOMORODIAN, G. CARLSSON, Computing persistent homology, Discrete and Computational Geometry 33, 2 (2005), pp. 247-274. Department of Mathematics, Stanford University E-mail address: gunnar@math.stanford.edu

Potrebbero piacerti anche