Towards An Algorithmic Realization of Nash's Embedding Theorem

Towards an Algorithmic Realization of Nashs Embedding Theorem
Nakul Verma
CSE, UC San Diego
naverma@cs.ucsd.edu
Abstract
It is well known from differential geometry that an n-dimensional Riemannian manifold can be isometrically embedded in a Euclidean space of dimension 2n + 1 [Nas54]. Though the proof by Nash
is intuitive, it is not clear whether such a construction is achievable by an algorithm that only has
access to a finite-size sample from the manifold. In this paper, we study Nashs construction and
develop two algorithms for embedding a fairly general class of n-dimensional Riemannian manifolds (initially residing in RD ) into Rk (where k only depends on some key manifold properties,
such as its intrinsic dimension, its volume, and its curvature) that approximately preserves geodesic
distances between all pairs of points. The first algorithm we propose is computationally fast and
embeds the given manifold approximately isometrically into about O(2cn ) dimensions (where c is
an absolute constant). The second algorithm, although computationally more involved, attempts to
minimize the dimension of the target space and (approximately isometrically) embeds the manifold
in about O(n) dimensions.
Introduction
Finding low-dimensional representations of manifolds has proven to be an important task in data analysis and
data visualization. Typically, one wants a low-dimensional embedding to reduce computational costs while
maintaining relevant information in the data. For many learning tasks, distances between data-points serve as
an important approximation to gauge similarity between the observations. Thus, it comes as no surprise that
distance-preserving or isometric embeddings are popular.
The problem of isometrically embedding a differentiable manifold into a low dimensional Euclidean
space has received considerable attention from the differential geometry community and, more recently, from
the manifold learning community. The classic results by Nash [Nas54, Nas56] and Kuiper [Kui55] show that
any compact Riemannian manifold of dimension n can be isometrically C 1 -embedded1 in Euclidean space of
dimension 2n + 1, and C -embedded in dimension O(n2 ) (see [HH06] for an excellent reference). Though
these results are theoretically appealing, they rely on delicate handling of metric tensors and solving a system
of PDEs, making their constructions difficult to compute by a discrete algorithm.
On the algorithmic front, researchers in the manifold learning community have devised a number of
spectral algorithms for finding low-dimensional representations of manifold data [TdSL00, RS00, BN03,
DG03, WS04]. These algorithms are often successful in unravelling non-linear manifold structure from
samples, but lack rigorous guarantees that isometry will be preserved for unseen data.
Recently, Baraniuk and Wakin [BW07] and Clarkson [Cla07] showed that one can achieve approximate
isometry via the technique of random projections. It turns out that projecting an n-dimensional manifold
(initially residing in RD ) into a sufficiently high dimensional random subspace is enough to approximately
preserve all pairwise distances. Interestingly, this linear embedding guarantees to preserve both the ambient
Euclidean distances as well as the geodesic distances between all pairs of points on the manifold without
even looking at the samples from the manifold. Such a strong result comes at the cost of the dimension
2
of the embedded space. To get (1 )-isometry
, for instance, Baraniuk and Wakin [BW07] show that
n
VD
a target dimension of size about O 2 log is sufficient, where V is the n-dimensional volume of the
manifold and is a global bound on the curvature. This result was sharpened by Clarkson [Cla07] by
1
2
A C k -embedding of a smooth manifold M is an embedding of M that has k continuous derivatives.

A (1 )-isometry means that all distances are within a multiplicative factor of (1 ).
Figure 1: A simple example demonstrating Nashs embedding technique on a 1-manifold. Left: Original 1-manifold
in some high dimensional space. Middle: A contractive mapping of the original manifold via a linear projection onto
the vertical plane. Different parts of the manifold are contracted by different amounts distances at the tail-ends are
contracted more than the distances in the middle. Right: Final embedding after applying a series of spiralling corrections.
Small spirals are applied to regions with small distortion (middle), large spirals are applied to regions with large distortions
(tail-ends). Resulting embedding is isometric (i.e., geodesic distance preserving) to the original manifold.
completely removing the dependence on ambient dimension D and partially substituting with more averagecase manifold properties. In either case, the 1/2 dependence is troublesome: if we want an embedding with
all distances within 99% of the original distances (i.e., = 0.01), the bounds require the dimension of the
target space to be at least 10,000!
One may wonder whether the dependence on is really necessary to achieve isometry. Nashs theorem
suggests that an -free bound on the target space should be possible.
1.1 Our Contributions
In this work, we elucidate Nashs C 1 construction, and take the first step in making Nashs theorem algorithmic by providing two simple algorithms for approximately isometrically embedding n-manifolds (manifolds
with intrinsic dimension n), where the dimension of the target space is independent of the ambient dimension
D and the isometry constant . The first algorithm we propose is simple and fast in computing the target
embedding but embeds the given n-manifold in about 2cn dimensions (where c is an absolute constant).
The second algorithm we propose focuses on minimizing the target dimension. It is computationally more
involved but embeds the given n-manifold in about O(n) dimensions.
We would like to highlight that both our proposed algorithms work for a fairly general class of manifolds.
There is no requirement that the original n-manifold is connected, or is globally isometric (or even globally
diffeomorphic) to some subset of Rn as is frequently assumed by several manifold embedding algorithms.
In addition, unlike spectrum-based embedding algorithms available in the literature, our algorithms yield an
explicit C -embedding that cleanly embeds out-of-sample data points, and provide isometry guarantees over
the entire manifold (not just the input samples).
On the technical side, we emphasize that the techniques used in our proof are different from what Nash
uses in his work; unlike traditional differential-geometric settings, we can only access the underlying manifold through a finite size sample. This makes it difficult to compute quantities (such as the curvature tensor
and local functional form of the input manifold, etc.) that are important in Nashs approach for constructing
an isometric embedding. Our techniques do, however, use various differential-geometric concepts and our
hope is to make such techniques mainstream in analyzing manifold learning algorithms.
Nashs Construction for C 1 -Isometric Embedding
Given an n-dimensional manifold M (initially residing in RD ), Nashs embedding can be summarized in two
steps (see also [Nas54]). (1) Find a contractive3 mapping of M in the desired dimensional Euclidean space.
(2) Apply an infinite series of corrections to restore the distances to their original lengths.
In order to maintain the smoothness, the contraction and the target dimension in step one should be chosen
carefully. Nash notes that one can use Whitneys construction [Whi36] to embed M in R2n+1 without
introducing any kinks, tears, or discontinuities in the embedding. This initial embedding, which does not
necessarily preserve any distances, can be made into a contraction by adjusting the scale.
The corrections in step two should also be done with care. Each correction stretches out a small region of
the contracted manifold to restore local distances as much as possible. Nash shows that applying a successive
sequence of spirals4 in directions normal to the embedded M is a simple way to stretch the distances while
maintaining differentiability. The aggregate effect of applying these spiralling perturbations is a globallyisometric mapping of M in R2n+1 . See Figure 1 for an illustration.
3
4
A contractive mapping or a contraction is a mapping that doesnt increase the distance between points.
A spiral map is a mapping of the form t 7 (t, sin(t), cos(t)).
Remark 1 Adjusting the lengths by applying spirals is one of many ways to do local corrections. Kuiper
[Kui55], for instance, discusses an alternative way to stretch the contracted manifold by applying corrugations and gets a similar isometry result.
2.1 Algorithm for Embedding n-Manifolds: Intuition
Taking inspiration from Nashs construction, our proposed embedding will also be divided in two stages. The
first stage will attempt to find a contraction : RD Rd of our given n-manifold M RD in low dimensions. The second will apply a series of local corrections 1 , 2 , . . . (collectively refered to as the mapping
: Rd Rd+k ) to restore the geodesic distances.
Contraction stage: A pleasantly surprising observation is that a random projection of M into d = O(n)
dimensions is a bona fide injective, differential-structure preserving contraction with high probability (details
in Section 5.1). Since we dont require isometry in the first stage (only a contraction), we can use a random
projection as our contraction mapping without having to pay the 1/2 penalty.
Correction stage: We will apply several corrections to stretch-out our contracted manifold (M ). To understand a single correction i better, we can consider its effect on a small section of (M ). Since, locally,
the section effectively looks like a contracted n dimensional affine space, our correction map needs to restore
distances over this n-flat. Let U := [u1 , . . . , un ] be a d n matrix whose columns form an orthonormal
basis for this n-flat in Rd and let s1 , . . . , sn be the corresponding shrinkages along the n directions. Then one
can consider applying an n-dimensional analog of the spiral mapping: i (t) := (t, sin (t), cos (t)), where
sin (t) := (sin((Ct)1 ), . . . , sin((Ct)n )) and cos (t) := (cos((Ct)1 ), . . . , cos((Ct)n )). Here C serves as
an n d correction matrix that controls how much of the surface needs to stretch.
It turns out that if one
p
sets C to be the matrix SU T (where S is a diagonal matrix with entry Sii := (1/si )2 1, recall that si
was the shrinkage along direction ui ), then the correction i precisely restores the shrinkages along the n
orthonormal directions on the resultant surface (see our discussion in Section 5.2 for details).
Since different parts of the contracted manifold need to be stretched by different amounts, we localize
the effect of i to a small enough neighborhood by applying a specific kind of kernel function known as a
bump function in the analysis literature (details in Section 5.2, cf. Figure 5 middle). Applying different
i s at different parts of the manifold should have an aggregate effect of creating an (approximate) isometric
embedding.
We now have a basic outline of our algorithm. Let M be an n-dimensional manifold in RD . We first
find a contraction of M in d = O(n) dimensions via a random projection. This preserves the differential
structure but distorts the interpoint geodesic distances. We estimate the distortion at different regions of the
projected manifold by comparing a sample from M with its projection. We then perform a series of spiral
correctionseach applied locallyto adjust the lengths in the local neighborhoods. We will conclude that
restoring the lengths in all neighborhoods yields a globally consistent (approximately) isometric embedding
of M . Figure 4 shows a quick schematic of our two stage embedding with various quantities of interest.
Based on exactly how these different local i s are applied gives rise to our two algorithms. For the
first algorithm, we shall apply i maps simultaneously by making use of extra coordinates so that different
corrections dont interfere with each other. This yields a simple and computationally fast embedding. We
shall require about 2cn additional coordinates to apply the corrections, making the final embedding size of
2cn (here c is an absolute constant). For the second algorithm, we will follow Nashs technique more closely
and apply i maps iteratively in the same embedding space without the use of extra coordinates. Since all
i s will share the same coordinate space, extra care needs to be taken in applying the corrections. This will
require additional computational effort in terms of computing normals to the embedded manifold (details
later), but will result in an embedding of size O(n).
Preliminaries
Let M be a smooth, n-dimensional compact Riemannian submanifold of RD . Since we will be working with
samples from M , we need to ensure certain amount of regularity. Here we borrow the notation from Niyogi
et al. [NSW06] about the condition number of M .
Definition 1 (condition number [NSW06]) Let M RD be a compact Riemannian manifold. The condition number of M is 1 , if is the largest number such that the normals of length r < at any two distinct
points p, q M dont intersect.
The condition number 1/ is an intuitive notion that captures the complexity of M in terms of its
curvature. We can, for instance, bound the directional curvature at any p M by . Figure 2 depicts the
3
Figure 2: Tubular neighborhood of a manifold. Note that the normals (dotted lines) of a particular length incident at
each point of the manifold (solid line) will intersect if the manifold is too curvy.
normals of a manifold. Notice that long non-intersecting normals are possible only if the manifold is relatively
flat. Hence, the condition number of M gives us a handle on how curvy can M be. As a quick example, lets
calculate the condition number of an n-dimensional sphere of radius r (embedded in RD ). Note that in this
case one can have non-intersecting normals of length less than r (since otherwise they will start intersecting
at the center of the sphere). Thus the condition number of such a sphere is 1/r. Throughout the text we will
assume that M has condition number 1/ .
We will use DG (p, q) to indicate the geodesic distance between points p and q where the underlying
manifold is understood from the context, and kp qk to indicate the Euclidean distance between points p and
q where the ambient space is understood from the context.
To correctly estimate the distortion induced by the initial contraction mapping, we will additionally require a high-resolution covering of our manifold.
Definition 2 (bounded manifold cover) Let M RD be a Riemannian n-manifold. We call X M an
-bounded (, )-cover of M if for all p M and -neighborhood Xp := {x X : kx pk < } around
p, we have

xj x0
0
exist points x0 , . . . , xn Xp such that kxxii x
x0 k kxj x0 k 1/2n, for i 6= j. (covering criterion)

|Xp | . (local boundedness criterion)
exists point x Xp such that kx pk /2. (point representation criterion)
for any n + 1 points in Xp satisfying the covering criterion, let Tp denote the n-dimensional affine space
passing through
(note that Tp does not necessarily pass through p). Then, for any unit vector v in
them
v

Tp , we have v kvk 1 , where v is the projection of v onto the tangent space of M at p. (tangent
space approximation criterion)
The above is an intuitive notion of manifold sampling that can estimate the local tangent spaces. Curiously,
we havent found such tangent-space approximating notions of manifold sampling in the literature. We do
note in passing that our sampling criterion is similar in spirit to the (, )-sampling (also known as tight
-sampling) criterion popular in the Computational Geometry literature (see e.g. [DGGZ02, GW03]).
Remark 2 Given an n-manifold M with condition number 1/ , and some 0 < 1, if /3 2n, then
10n+1
one can construct a 2
-bounded (, )-cover of M see Appendix A.2 for details.
We can now state our two algorithms.
The Algorithms
Inputs. We assume the following quantities are given

(i) n the intrinsic dimension of M .
(ii) 1/ the condition number of M .
(iii) X an -bounded (, )-cover of M .
(iv) the parameter of the cover.
4
Notation. Let be a random orthogonal projection map that maps points from p
RD into a random subspace
of dimension d (n d D). We will have d to be about O(n). Set := (2/3)( D/d) as a scaled version
of . Since is linear, can also be represented as a d D matrix. In our discussion below we will use the
function notation and the matrix notation interchangeably, that is, for any p RD , we will use the notation
(p) (applying function to p) and the notation p (matrix-vector multiplication) interchangeably.
any x X,
x For
let x0 , . . . , xn be n + 1 points from the set {x X : kx x k < } such that
x
x
x
j
0
i
0

kxi x0 k kxj x0 k 1/2n, for i 6= j (cf. Definition 2). Let Fx be the D n matrix whose column vectors
form some orthonormal basis of the n-dimensional subspace spanned by the vectors {xi x0 }i[n] .
Estimating local contractions. We estimate the contraction caused by at a small enough neighborhood
of M containing the point x X, by computing the thin Singular Value Decomposition (SVD) Ux x VxT
of the d n matrix Fx and representing the singular values in the conventional descending order. That is,
Fx = Ux x VxT , and since Fx is a tall matrix (n d), we know that the bottom d n singular values are
zero. Thus, we only consider the top n (of d) left singular vectors in the SVD (so, Ux is d n, x is n n,
and Vx is n n) and x1 x2 . . . xn where xi is the ith largest singular value.
Observe that the singular values x1 , . . . , xn are precisely the distortion amounts in the directions u1x , . . . , unx
at (x) Rd ([u1x , . . . , unx ] = Ux ) when we apply . To see this, consider the direction wi := Fx vxi in the
column-span of Fx ([vx1 , . . . , vxn ] = Vx ). Then wi = (Fx )vxi = xi uix , which can be interpreted as:
maps the vector wi in the subspace Fx (in RD ) to the vector uix (in Rd ) with the scaling of xi .
matrix
Note that if 0 < xi 1 (for all x X and 1 i n), we can define an n d correction
p
x
T
i
(corresponding to each x X) C := Sx Ux , where Sx is a diagonal matrix with (Sx )ii := (1/x )2 1.
1/2
We can also write Sx as (2
. The correction matrix C x will have an effect of stretching the direction
x I)
i
ux by the amount (Sx )ii and killing any direction v that is orthogonal to (column-span of) Ux .
Algorithm 1 Compute Corrections C x s
1: for x X (in any order) do

xj x0
0
2:
Let x0 , . . . , xn {x X : kx xk < } be such that kxxii x
x0 k kxj x0 k 1/2n (for i 6= j).
3:
Let Fx be a D n matrix whose columns form an orthonormal basis of the n-dimensional span of the
vectors {xi x0 }i[n] .
4:
Let Ux x VxT be the thin SVD of Fx .
1/2 T
5:
Set C x := (2
Ux .
x I)
6: end for
Algorithm 2 Embedding Technique I

Preprocessing Stage: We will first partition the given covering X into disjoint subsets such that no subset
contains points that are too close to each other. Let x1 , . . . , x|X| be the points in X in some arbitrary but
fixed order. We can do the partition as follows:
1: Initialize X (1) , . . . , X (K) as empty sets.
2: for xi X (in any fixed order) do
3:
Let j be the smallest positive integer such that xi is not within distance 2 of any element in X (j) .
That is, the smallest j such that for all x X (j) , kx xi k 2.
4:
X (j) X (j) {xi }.
5: end for
The Embedding: For any p M RD , we embed it in Rd+2nK as follows:
1: Let t = (p).
2: Define (t)
:=
(t, 1,sin (t), 1,cos (t), . . . , K,sin (t), K,cos (t)) where j,sin (t)
:=
1
n
1
n
(j,sin
(t), . . . , j,sin
(t)) and j,cos (t) := (j,cos
(t), . . . , j,cos
(t)). The individual terms are given
by
p
P
i
j,sin
(t) := xX (j) ( p(x) (t)/) sin((C x t)i )
P
i = 1, . . . , n; j = 1, . . . , K
i
j,cos
(t) := xX (j) ( (x) (t)/) cos((C x t)i )
2
where a (b) =
1{kabk<} e1/(1(kabk/) )
.
1/(1(kqbk/)2 )
qX 1{kqbk<} e
d+2nK
3: return (t) as the embedding of p in R
Algorithm 3 Embedding Technique II

The Embedding: Let x1 , . . . , x|X| be the points in X in some arbitrary but fixed order. Now, for any point
p M RD , we embed it in R2d+3 as follows:
1: Let t = (p).
2: Define 0,n (t) := (t, 0, . . . , 0)
| {z }
d+3
3: for i = 1, . . . , |X| do
4:
Define i,0 := i1,n .
5:
for j = 1, . . . , n do
6:
Let i,j (t) and i,j (t) be two mutually orthogonal unit vectors normal to i,j1 (M ) at i,j1 (t).
7:
Define
i,j (t) := i,j1 (t) + i,j (t)

where a (b) =
p
(xi ) (t)
i,j
xi
sin(i,j (C t)j ) + i,j (t)
p
(xi ) (t)
i,j
cos(i,j (C xi t)j )
1{kabk<} e1/(1(kabk/) )
.
1/(1(kqbk/)2 )
qX 1{kqbk<} e
8:
end for
9: end for
10: return |X|,n (t) as the embedding of p into R2d+3 .
A few remarks are in order.

Remark 3 The function in both embeddings acts as a localizing kernel that helps in localizing the effects
of the spiralling corrections (discussed in detail in Section 5.2), and > 0 (for Embedding I) or i,j > 0
(for Embedding II) are free parameters controlling the frequency of the sinusoidal terms.
Remark 4 If /4, the number of subsets (i.e. K) produced by Embedding I is at most 2cn for an
-bounded (, ) cover X of M (where c 4). See Appendix A.3 for details.
Remark 5 The success of Embedding II crucially depends upon finding a pair of normal unit vectors and
in each iteration; we discuss how to approximate these in Appendix A.9.
We shall see that for appropriate choice of d, , and (or i,j ), our algorithm yields an approximate
isometric embedding of M .
4.1 Main Result
Theorem 3 Let M RD be a compact n-manifold with volume V and condition number 1/ (as above). Let
d = (n + ln(V / n )) be the target dimension of the initial random projection mapping such that d D.
For any 0 < 1, let ( d/D)(/350)2 , (d/D)(/250)2 , and let X M be an -bounded
(, )-cover of M . Now, let
i. NI Rd+2n2
ii. NII R
2d+3
cn
be the embedding of M returned by Algorithm I (where c 4),
be the embedding of M returned by Algorithm II.
Then, with probability at least 11/poly(n) over the choice of the initial random projection, for all p, q M
and their corresponding mappings pI , qI NI and pII , qII NII , we have
i. (1 )DG (p, q) DG (pI , qI ) (1 + )DG (p, q),
ii. (1 )DG (p, q) DG (pII , qII ) (1 + )DG (p, q).
Proof
Our goal is to show that the two proposed embeddings approximately preserve the length of all geodesic
Rb
curves. Now, since the length of any given curve : [a, b] M is given by a k (s)kds, it is vital to study
how our embeddings modify the length of the tangent vectors at any point p M .
In order to discuss tangent vectors, we need to introduce the notion of a tangent space Tp M at a particular
point p M . Consider any smooth curve c : (, ) M such that c(0) = p, then we know that c (0) is the
vector tangent to c at p. The collection of all such vectors formed by all such curves is a well defined vector
6
TF (p) F (M )
Tp M
v
p
F (M )
F (p)
(DF )p (v)
Figure 3: Effects of applying a smooth map F on various quantities of interest. Left: A manifold M containing point
p. v is a vector tangent to M at p. Right: Mapping of M under F . Point p maps to F (p), tangent vector v maps to
(DF )p (v).
space (with origin at p), called the tangent space Tp M . In what follows, we will fix an arbitrary point p M
and a tangent vector v Tp M and analyze how the various steps of the algorithm modify the length of v.
Let be the initial (scaled) random projection map (from RD to Rd ) that may contract distances on M
by various amounts, and let be the subsequent correction map that attempts to restore these distances (as
defined in Step 2 for Embedding I or as a sequence of maps in Step 7 for Embedding II). To get a firm footing
for our analysis, we need to study how and modify the tangent vector v. It is well known from differential
geometry that for any smooth map F : M N that maps a manifold M Rk to a manifold N Rk ,
there exists a linear map (DF )p : Tp M TF (p) N , known as the derivative map or the pushforward (at p),
that maps tangent vectors incident at p in M to tangent vectors incident at F (p) in N . To see this, consider
a vector u tangent to M at some point p. Then, there is some smooth curve c : (, ) M such that
c(0) = p and c (0) = u. By mapping the curve c into N , i.e. F (c(t)), we see that F (c(t))
includes the point
dF (c(t))
F (p) at t = 0. Now, by calculus, we know that the derivative at this point,
is the directional

dt
t=0
derivative (F )p (u), where (F )p is a k k matrix called the gradient (at p). The quantity (F )p is
precisely the matrix representation of this linear pushforward map that sends tangent vectors of M (at p)
to the corresponding tangent vectors of N (at F (p)). Figure 3 depicts how these quantities are affected by
applying F . Also note that if F is linear then DF = F .
Observe that since pushforward maps are linear, without loss of generality we can assume that v has unit
length.
A quick roadmap for the proof. In the next three sections, we take a brief detour to study the effects of
applying , applying for Algorithm I, and applying for Algorithm II separately. This will give us the
necessary tools to analyze the combined effect of applying on v (Section 5.4). We will conclude by
relating tangent vectors to lengths of curves, showing approximate isometry (Section 5.5). Figure 4 provides a
quick sketch of our two stage mapping with the quantities of interest. We defer the proofs of all the supporting
lemmas to the Appendix.
5.1 Effects of Applying
It is well known as an application of Sards theorem from differential topology (see e.g. [Mil72]) that almost
every smooth mapping of an n-dimensional manifold into R2n+1 is a differential structure preserving embedding of M . In particular, a projection onto a random subspace (of dimension 2n + 1) constitutes such an
embedding with probability 1.
This translates to stating that a random projection into R2n+1 is enough to guarantee that doesnt
collapse the lengths of non-zero tangent vectors. However, due to computational issues, we additionally
require that the lengths are bounded away from zero (that is, a statement of the form k(D)p (v)k (1)kvk
for all v tangent to M at all points p).
We can thus appeal to the random projections result by Clarkson [Cla07] (with the isometry parameter
set to a constant, say 1/4) to ensure this condition. In particular, it follows
Lemma 4 Let M RD be a smooth n-manifold (as defined above) with volume V and condition number
D
1/ . Let R be a random projection
p matrix that maps points from R into a random subspace ofn dimension d
(d D). Define := (2/3)( D/d)R as a scaled projection mapping. If d = (n + ln(V / )), then with
probability at least 1 1/poly(n) over the choice of the random projection matrix, we have
(a) For all p M and all tangent vectors v Tp M , (1/2)kvk k(D)p (v)k (5/6)kvk.
(b) For all p, q M , (1/2)kp qk kp qk (5/6)kp qk.

7
RD
Rd
v
Rd+k
u = v
t = p
(t)
M
kvk = 1
(D)t (u)
kuk 1
k(D)t (u)k kvk
Figure 4: Two stage mapping of our embedding technique. Left: Underlying manifold M RD with the quantities of
interest a fixed point p and a fixed unit-vector v tangent to M at p. Center: A (scaled) linear projection of M into a
random subspace of d dimensions. The point p maps to p and the tangent vector v maps to u := (D)p (v) = v. The
length of v contracts to kuk. Right: Correction of M via a non-linear mapping into Rd+k . We have k = O(2cn )
for correction technique I, and k = d + 3 for correction technique II (see also Section 4). Our goal is to show that
stretches length of contracted v (i.e. u) back to approximately its original length.
(c) For all x RD , kxk (2/3)(
D/d)kxk.
In what follows, we assume that is such a scaled random projection map. Then, a bound on the length of
tangent vectors also gives us a bound on the spectrum of Fx (recall the definition of Fx from Section 4).
Corollary 5 Let , Fx and n be as described above (recall that x X that forms a bounded (, )-cover
of M ). Let xi represent the ith largest singular value of the matrix Fx . Then, for d/32D, we have
1/4 xn x1 1 (for all x X).
We will be using these facts in our discussion below in Section 5.4.
5.2 Effects of Applying (Algorithm I)
As discussed in Section 2.1, the goal of is to restore the contraction induced by on M . To understand the
action of on a tangent vector better, we will first consider a simple case of flat manifolds (Section 5.2.1),
and then develop the general case (Section 5.2.2).
5.2.1 Warm-up: flat M
: R R3 given by t 7 (t, sin(Ct), cos(Ct)),
Lets first consider applying a simple one-dimensional spiral map
where t I = (, ). Let v be a unit vector tangent to I (at, say, 0). Then note that

d
t=0 (
(D)
v) =
= (1, C cos(Ct), C sin(Ct))t=0 .

dt t=0

stretches the length of v from 1 to (1, C cos(Ct), C sin(Ct))|t=0 = 1 + C 2 . NoThus, applying

tice the advantage of applying the spiral map in computing the lengths: the sine and cosine terms combine
together to yield a simple expression for the size of the
stretch. In particular, if we want to stretch the length
of v from 1 to, say, L 1, then we simply need C = L2 1 (notice the similarity between this expression
and our expression for the diagonal component Sx of the correction matrix C x in Section 4).
We can generalize this to the case of n-dimensional flat manifold (a section of an n-flat) by considering a
For concreteness, let F be a D n matrix whose column vectors form some orthonormal
map similar to .
basis of the n-flat manifold (in the original space RD ). Let U V T be the thin SVD of F . Then F V
forms an orthonormal basis of the n-flat manifold (in RD ) that maps to an orthogonal basis U of the
: Rd Rd+2n
projected n-flat manifold (in Rd ) via the contraction mapping . Define the spiral map
1
n
sin (t),
cos (t)), with
sin (t) := ( (t), . . . , (t)) and
cos (t) :=
in this case as follows. (t)
:= (t,
sin
sin
n
1
(t)). The individual terms are given as
(t), . . . , cos
(cos
i
sin
(t) := sin((Ct)i )
i
(t) := cos((Ct)i )
cos
i = 1, . . . , n,
where C is now an n d correction matrix. It turns out that setting C = (2 I)1/2 U T precisely restores
the contraction caused by to the tangent vectors (notice the similarity between this expression with the
8
correction matrix in the general case C x in Section 4 and our motivating intuition in Section 2.1). To see this,
let v P
be a vector tangent to the n-flat at some point p (in RD ). We will represent
v in the F P
V basis (that is,
P
vP= i i (F v i ) where [F v 1 , . . . , F v n ] = F V ). Note that kvk2 = k i i F v i k2 = k i i i ui k2 =
i 2
i
i
forming
i (i ) (where are the individual singular values of and u are the left singular vectorsP
i
the columns of U ). Now, let w be the pushforward of v (that is, w = (D)p (v) = v =
i wi e ,
d
i
2
where {e }i forms the standard basis of R ). Now, since D is linear, we have k(D
)(p) (w)k =
P
sin (t) d
cos (t)

d
dt d
i 2
i
wi (D)(p) (e )k , where (D)(p) (e ) = i

k
=
,
. The in
i,
i
i
i
dt
t=(p)
dt
dt
dt
t=(p)
dividual components are given by
k
dsin
(t)/dti = + cos((Ct)k )Ck,i
k
dcos
(t)/dti = sin((Ct)k )Ck,i
k = 1, . . . , n; i = 1, . . . , d.
By algebra, we see that

))p (v)k2
k(D(
=
=
(p) ((D)p (v))k2 = k(D)

(p) (w)k2
k(D)
d
X
wk2 +
=
=
cos2 ((C(p))k )((Cv)k )2 +
k=1
k=1
d
X
n
X
wk2 +
n
X
sin2 ((C(p))k )((Cv)k )2
k=1
((Cv)k )2 = kvk2 + kCvk2 = kvk2 + (v)T C T C(v)
k=1
k=1
n
X
kvk + (
2
X
i
i i ui )T U (2 I)U T (
n
i i u i )
kvk + [1 , . . . , n ]( I)[1 1 , . . . , n n ]T
X
X
kvk2 + (
i2
(i i )2 ) = kvk2 + kvk2 kvk2 = kvk2 .
i
can exactly restore the contraction caused by for any

In other words, our non-linear correction map
vector tangent to an n-flat manifold.
In the fully general case, the situation gets slightly more complicated since we need to apply different
spiral maps, each corresponding to a different size correction at different locations on the contracted manifold.
Recall that we localize the effect of a correction by applying the so-called bump function (details below).
These bump functions, although important for localization, have an undesirable effect on the stretched length
of the tangent vector. Thus, to ameliorate their effect on the length of the resulting tangent vector, we control
their contribution via a free parameter .
5.2.2 The General Case
More specifically, Embedding Technique I restores the contraction induced by by applying a non-linear
map (t) := (t, 1,sin (t), 1,cos (t), . . . , K,sin (t), K,cos (t)) (recall that K is the number of subsets we
1
n
decompose X into cf. description in Embedding I in Section 4), with j,sin (t) := (j,sin
(t), . . . , j,sin
(t))
n
1
and j,cos (t) := (j,cos (t), . . . , j,cos (t)). The individual terms are given as
p
P
i
j,sin
(t) := xX (j) ( p(x) (t)/) sin((C x t)i )
P
i = 1, . . . , n; j = 1, . . . , K,
i
j,cos
(t) := xX (j) ( (x) (t)/) cos((C x t)i )
where C x s are the correction amounts for different locations
P x on the manifold, > 0 controls the frequency
(cf. Section 4), and (x) (t) is defined to be (x) (t)/ qX (q) (t), with

exp(1/(1 kt (x)k2 /2 )) if kt (x)k < .
(x) (t) :=
0
otherwise.
is a classic example of a bump function (see Figure 5 middle). It is a smooth function with compact
support. Its applicability arises from the fact that it can be made to specifications. That is, it can be
made to vanish outside any interval of our choice. Here we exploit this property to localize the effect of our
corrections. The normalization of (the function ) creates the so-called smooth partition of unity that helps
to vary smoothly between the spirals applied at different regions of M .
Since any tangent vector in Rd can be expressed in terms of the basis vectors, it suffices to study how D
d
(t) dK,cos (t)
dt d1,sin (t) d1,cos (t)
acts on the standard basis {ei }. Note that (D)t (ei ) = dt
,
, dti , . . . , K,sin
,
i,
dti
dti
dti
t
9
0.4
1
0.4
0.35
0.3
0.3
0.5
0.2
0.1
0.25
(t)
0.5
0.2
0.1
0.2
0.15
0.3
0.1
1
1
0.5
0.4
0.4
0.2
0.05
0
0
0.5
1
0
2
0.2
1.5
0.5
0.5
1.5
|tx|/
0.4
Figure 5: Effects of applying a bump function on a spiral mapping. Left: Spiral mapping t 7 (t, sin(t), cos(t)).
Middle: Bump function x : a smooth function with compact support. The parameter x controls the location while
controls the width. Right: The combined effect: t 7 (t, x (t) sin(t), x (t) cos(t)). Note that the effect of the spiral is
localized while keeping the mapping smooth.
where
k
dj,sin
(t)/dti =
k
dj,cos
(t)/dti
xX (j)
xX (j)
1/2
p
d(x) (t)
x
+ (x) (t) cos((C x t)k )Ck,i
dti

1/2
p
d(x) (t)
1
x
x
(x) (t) sin((C x t)k )Ck,i
cos((C t)k )
dti
sin((C x t)k )
k = 1, . . . , n; i = 1, . . . , d
.
j = 1, . . . , K
One can now observe the advantage of having the term . By picking sufficiently
P large, we can make the
first part of the expression sufficiently small. Now, for any tangent vector u = i ui ei such that kuk 1,
we have (by algebra)
X
2

2
k(D)t (u)k =
ui (D)t (ei )
=
d
X
k=1
u2k +
K h X k,x
n X
i2
X
Asin (t) q
+ (x) (t) cos((C x t)k )(C x u)k
(j)
j=1
k=1
xX
h X Ak,x (t) q
i2
cos
(x) (t) sin((C x t)k )(C x u)k
+
(j)
(1)
xX
P
P
1/2
1/2
k,x
x
i
x
i
where Ak,x
sin (t) :=
i ui sin((C t)k )(d(x) (t)/dt ) and Acos (t) :=
i ui cos((C t)k )(d(x) (t)/dt ).
We can further simplify Eq. (1) and get
Lemma 6 Let t be any point in (M ) and u be any vector tagentto (M ) at t such that kuk 1. Let be
the isometry parameter chosen in Theorem 3. Pick (n2 9n d/), then
n
X
X
2
k(D)t (u)k = kuk2 +
(C x u)2k + ,
(2)
(x) (t)
xX
k=1
where || /2.
We will use this derivation of k(D)t (u)k to study the combined effect of on M in Section 5.4.
5.3 Effects of Applying (Algorithm II)
The goal of the second algorithm is to apply the spiralling corrections while using the coordinates more
economically. We achieve this goal by applying them sequentially in the same embedding space (rather than
simultaneously by making use of extra 2nK coordinates as done in the first algorithm), see also [Nas54].
Since all the corrections will be sharing the same coordinate space, one needs to keep track of a pair of
normal vectors in order to prevent interference among the different local corrections.
More specifically, : Rd R2d+3 (in Algorithm II) is defined recursively as := |X|,n such that
(see also Embedding II in Section 4)
p
p
(xi ) (t)
(xi ) (t)
xi
i,j (t) := i,j1 (t) + i,j (t)
sin(i,j (C t)j ) + i,j (t)
cos(i,j (C xi t)j ),
i,j
i,j
d+3
z }| {
where i,0 (t) := i1,n (t), and the base function 0,n (t) is given as t 7 (t, 0, . . . , 0). i,j (t) and i,j (t)
are mutually orthogonal unit vectors that are approximately normal to i,j1 (M ) at i,j1 (t). In this
section we assume that the normals and have the following properties:
10
- |i,j (t) v| 0 and |i,j (t) v| 0 for all unit-length v tangent to i,j1 (M ) at i,j1 (t). (quality of
normal approximation)
- For all 1 l d, we have kdi,j (t)/dtl k Ki,j and kdi,j (t)/dtl k Ki,j . (bounded directional
derivatives)
We refer the reader to Section A.9 for details on how to estimate such normals.
P
ul el (such that kuk2 1) in terms of its basis

2d+3

di,j (t)
vectors, it suffices to study how D acts on basis vectors. Observe that (Di,j )t (el ) =
,
dtl
Now, as before, representing a tangent vector u =
with the k th component given as

q
q
di,j1 (t)
xi i,j
xi i,j
+
(
(t))
(t)C
(xi ) (t)Cj,l
B
(t)
(
(t))
Bsin (t)
i,j
k
i,j
k
(x
)
i
j,l cos
dtl
k
d (t) q
1 h di,j (t) q
i,j
i,j
i,j
+
(t)B
(xi ) (t)Bcos
(t)
(t)
+
(x
)
i
sin
i,j
dtl
dtl
k
k
1/2
1/2
+ (i,j (t))k
d(xi ) (t)
dtl
i,j
Bsin
(t) + (i,j (t))k
d(xi ) (t)
dtl
k=1
i
i,j
Bcos
(t) ,
i,j
k,l
i,j
where Bcos
(t) := cos(i,j (C xi t)j ) and Bsin
be the
(t) := sin(i,j (C xi t)j ). For ease of notation, let Ri,j
terms in the bracket (being multiplied to 1/i,j ) in the above expression. Then, we have (for any i, j)
X
2
k(Di,j )t (u)k2 =
ul (Di,j )t (el )
l
2d+3
X hX
k=1
ul

q
X
di,j1 (t)
xi
(xi ) (t) cos(i,j (C xi t)j )
Cj,l
ul
+
(
(t))
i,j
k
l
dt
k
l
{z
} |
{z
}
k,1
i,j
(i,j (t))k
|
k,2
i,j
(xi ) (t) sin(i,j (C xi t)j )
X
l
{z
k,3
i,j
= k(Di,j1 )t (u)k2 + (xi ) (t)(C xi u)2j

|
{z
}
{z
}
|
2
2
2
P
P
k,1
=
Xh
k
i,j
k,4
i,j
/i,j
2
xi
ul +(1/i,j )
Cj,l
k,2
i,j
k,l
ul Ri,j
{z
k,4
i,j
i2
k,3
+ i,j
k,4
+ 2i,j
/i,j
i
k,1
k,2
k,3
k,1 k,2
k,1 k,3
i,j
+ i,j
+ i,j
+ 2 i,j
i,j + i,j
i,j ,
{z
Zi,j
where the last equality is by expanding the square and by noting that
orthogonal to each other. The base case k(D0,n )t (u)k2 equals kuk2 .
k,2 k,3
k i,j i,j
(3)
= 0 since and are
P k,1 k,2
P k,1 k,3
Again, by picking i,j sufficiently large, and by noting that the cross terms k (i,j
i,j ) and k (i,j
i,j )
are very close to zero since and are approximately normal to the tangent vector, we have
Lemma 7 Let t be any point in (M ) and u be any vector tagent to (M ) at t such that kuk
1. Let be
2
the isometry parameter chosen in Theorem 3. Pick i,j (Ki,j + (9n /))(nd|X|)
/
(recall that Ki,j

is the bound on the directional derivate of and ). If 0 O /d(n|X|)2 (recall that 0 is the quality of
approximation of the normals and ), then we have
k(D)t (u)k2 = k(D|X|,n )t (u)k2 = kuk2 +
where || /2.
11
|X|
X
i=1
(xi ) (t)
n
X
j=1
(C xi u)2j + ,
(4)
Combined Effect of ((M ))
5.4
We can now analyze the aggregate effect of both our embeddings on the length of an arbitrary unit vector v
tangent to M at p. Let u := (D)p (v) = v be the pushforward of v. Then kuk 1 (cf. Lemma 4). See
also Figure 4.
Now, recalling that D( ) = D D, and noting that pushforward maps are linear, we have

2
P
2
k(D( ))p (v)k = (D)(p) (u) . Thus, representing u as i ui ei in ambient coordinates of Rd , and
using Eq. (2) (for Algorithm I) or Eq. (4) (for Algorithm II), we get
X

(D( ))p (v) 2 = (D)(p) (u) 2 = kuk2 +
(x) ((p))kC x uk2 + ,
xX
where || /2. We can give simple lower and upper bounds for the above expression by noting that
(x) is a localization function. Define Np := {x X : k(x) (p)k < } as the neighborhood
around p ( as per the theorem statement). Then only the points in Np contribute to above equation, since
(x) ((p)) = d(x) ((p))/dti = 0 for k(x) (p)k . Also note that for all x Np , kx pk < 2
(cf. Lemma 4).
Let xM := arg maxxNp kC x uk2 and xm := arg minxNp kC x uk2 are quantities that attain the maximum and the minimum respectively, then:
kuk2 + kC xm uk2 /2 k(D( ))p (v)k2 kuk2 + kC xM uk2 + /2.
(5)
Notice that ideally we would like to have the correction factor C p u in Eq. (5) since that would give the
perfect stretch around the point p. But what about correction C x u for closeby xs? The following lemma
helps us continue in this situation.
Lemma 8 Let p, v, u be as above. For any x Np X, let C x and Fx also be as discussed above (recall
that kp xk < 2, and X M forms a boundedp(, )-cover of the fixed underlying manifold M with
condition number 1/ ). Define := (4/ ) + + 4 / . If /4 and d/32D, then

p

p
D/d, D/d kC x uk2 1 kuk2 + 51 max
D/d, D/d .
1 kuk2 40 max
Note that we chose ( d/D)(/350)2 and (d/D)(/250)2 (cf. theorem statement). Thus,
combining Eq. (5) and Lemma 8, we get (recall kvk = 1)
2
(1 )kvk2 k(D( ))p (v)k (1 + )kvk2 .

So far we have shown that our embedding approximately preserves the length of a fixed tangent vector
at a fixed point. Since the choice of the vector and the point was arbitrary, it follows that our embedding
approximately preserves the tangent vector lengths throughout the embedded manifold uniformly. We will
now show that preserving the tangent vector lengths implies preserving the geodesic curve lengths.
5.5
Preservation of the Geodesics
Pick any two (path-connected) points p and q in M , and let be the geodesic5 path between p and q. Further
let p, q and
be the images of p, q and under our embedding. Note that
is not necessarily the geodesic
path between p and q, thus we need an extra piece of notation: let be the geodesic path between p and q

(under the embedded manifold) and be its inverse image in M . We need to show (1 )L() L()
(1 + )L(), where L() denotes the length of the path (end points are understood).
First recall that for any differentiable map F and curve , = F () = (DF )( ). By (1 )isometry of tangent vectors, this immediately gives us (1 )L() L(
) (1 + )L() for any path in
M and its image in embedding of M . So,
= DG (
(1 )DG (p, q) = (1 )L() (1 )L() L()
p, q).
Similarly,
L(
DG (
p, q) = L()
) (1 + )L() = (1 + )DG (p, q).
5
Globally, geodesic paths between points are not necessarily unique; we are interested in a path that yields the shortest
distance between the points.
12
Conclusion
This work provides two simple algorithms for approximate isometric embedding of manifolds. Our algorithms are similar in spirit to Nashs C 1 construction [Nas54], and manage to remove the dependence on the
isometry constant from the target dimension. One should observe that this dependency does however show
up in the sampling density required to make the necessary corrections.
The correction procedure discussed here can also be readily adapted to create isometric embeddings from
any manifold embedding procedure (under some mild conditions). Take any off-the-shelf manifold embedding algorithm A (such as LLE, Laplacian Eigenmaps, etc.) that maps an n-manifold in, say, d dimensions,
but does not necessarily guarantee an approximate isometric embedding. Then as long as one can ensure that
the embedding produced by A is a one-to-one contraction6 (basically ensuring conditions similar to Lemma
4), we can apply corrections similar to those discussed in Algorithms I or II to produce an approximate isometric embedding of the given manifold in slightly higher dimensions. In this sense, the correction procedure
presented here serves as a universal procedure for approximate isometric manifold embeddings.
Acknowledgements
The author would like to thank Sanjoy Dasgupta for introducing the subject, and for his guidance throughout
the project.
References
[BN03]
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):13731396, 2003.
[BW07]
R. Baraniuk and M. Wakin. Random projections of smooth manifolds. Foundations of Computational Mathematics, 2007.
[Cla07]
K. Clarkson. Tighter bounds for random projections of manifolds. Comp. Geometry, 2007.
[DF08]
S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. ACM
Symposium on Theory of Computing, 2008.
[DG03]
D. Donoho and C. Grimes. Hessian eigenmaps: locally linear embedding techniques for high
dimensional data. Proc. of National Academy of Sciences, 100(10):55915596, 2003.
[DGGZ02] T. Dey, J. Giesen, S. Goswami, and W. Zhao. Shape dimension and approximation from samples.
Symposium on Discrete Algorithms, 2002.
[GW03]
J. Giesen and U. Wagner. Shape dimension and intrinsic metric from samples of manifolds with
high co-dimension. Symposium on Computational Geometry, 2003.
[HH06]
Q. Han and J. Hong. Isometric embedding of Riemannian manifolds in Euclidean spaces. American Mathematical Society, 2006.
[JL84]
W. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Conf.
in Modern Analysis and Probability, pages 189206, 1984.
[Kui55]
N. Kuiper. On C 1 -isometric embeddings, I, II. Indag. Math., 17:545556, 683689, 1955.
[Mil72]
J. Milnor. Topology from the differential viewpoint. Univ. of Virginia Press, 1972.
[Nas54]
J. Nash. C 1 isometric imbeddings. Annals of Mathematics, 60(3):383396, 1954.
[Nas56]
J. Nash. The imbedding problem for Riemannian manifolds. Annals of Mathematics, 63(1):20
63, 1956.
[NSW06] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Disc. Computational Geometry, 2006.
[RS00]
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science,
290, 2000.
[TdSL00] J. Tenebaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2000.
[Whi36]
H. Whitney. Differentiable manifolds. Annals of Mathematics, 37:645680, 1936.
[WS04]
K. Weinberger and L. Saul. Unsupervised learning of image manifolds by semidefinite programming. Computer Vision and Pattern Recognition, 2004.
One can modify A to produce a contraction by simple scaling.
13
A Appendix
A.1 Properties of a Well-conditioned Manifold
Throughout this section we will assume that M is a compact submanifold of RD of dimension n, and condition number 1/ . The following are some properties of such a manifold that would be useful throughout the
text.
Lemma 9 (relating closeby tangent vectors implicit in the proof of Proposition 6.2 [NSW06]) Pick
any two (path-connected) points p, q M . Let u Tp M be a unit length tangent vector and v Tq M
7
be its parallel
p transport along the (shortest) geodesic path to q. Then , i) u v 1 DG (p, q)/ , ii)
ku vk 2DG (p, q)/ .
Lemma 10 (relating geodesic distances to ambient

p distances Proposition 6.3 of [NSW06]) If p, q M
such that kp qk /2, then DG (p, q) (1 1 2kp qk/ ) 2kp qk.
Lemma 11 (projection of a section of a manifold onto the tangent space) Pick any p M and define
Mp,r := {q M : kq pk r}. Let f denote the orthogonal linear projection of Mp,r onto the tangent
space Tp M . Then, for any r /2
(i) the map f : Mp,r Tp M is 1 1. (see Lemma 5.4 of [NSW06])
(ii) for any x, y Mp,r , kf (x) f (y)k2 (1 (r/ )2 ) kx yk2 . (implicit in the proof of Lemma 5.3
of [NSW06])
Lemma 12 (coverings of a section of a manifold) Pick any p M and define Mp,r := {q M : kq pk
r}. If r /2, then there exists C Mp,r of size at most 9n with the property: for any p Mp,r , exists
c C such that kp ck r/2.
Proof: The proof closely follows the arguments presented in the proof of Theorem 22 of [DF08].
For r /2, note that Mp,r RD is (path-)connected. Let f denote the projection of Mp,r onto
Tp M
= Rn . Quickly note that f is 1 1 (see Lemma 11(i)). Then, f (Mp,r ) Rn is contained in an
n-dimensional ball of radius r. By standard volume arguments, f (Mp,r ) can be covered by at most 9n balls
of radius r/4. WLOG we can assume that the centers of these covering balls are in f (Mp,r ). Now, noting
that the inverse image of each of these covering balls (in Rn ) is contained in a D-dimensional ball of radius
r/2 (see Lemma 11(ii)) finishes the proof.
Lemma 13 (relating closeby manifold points to tangent vectors) Pick any point p M and let q M
(distinct from p) be such
vector q p onto Tp M .
that DG (p, q) . Let v Tp M be the
projection of the
v
v
qp
qp
2
Then, i) kvk kqpk 1 (DG (p, q)/2 ) , ii) kvk kqpk DG (p, q)/ 2.
Proof: If vectors v and q p are in the same direction, we are done. Otherwise, consider the plane spanned
by vectors v and q p. Then since M has condition number 1/ , we know that the point q cannot lie within
any -ball tangent to M at p (see Figure 6). Consider such a -ball (with center c) whose center is closest to
q and let q be the point on the surface of the ball which subtends the same angle (pcq ) as the angle formed
by q (pcq). Let this angle be called . Then using cosine rule, we have cos = 1 kq pk2 /2 2 .
Define as the angle subtended by vectors v and q p, and the angle subtended by vectors v and
q p. WLOG we can assume that the angles and are less than . Then, cos cos = cos /2.
Using the trig identity cos = 2 cos2 2 1, and noting kq pk2 kq pk2 , we have

v
q p
p
2
2
2

kvk kq pk = cos cos 2 1 kq pk /4 1 (DG (p, q)/2 ) .
v

qp 2
Now, by applying the cosine rule, we have kvk
kqpk
= 2(1 cos ). The lemma follows.
7
Technically, it is not possible to directly compare two vectors that reside in different tangent spaces. However, since
we only deal with manifolds that are immersed in some ambient space, we can treat the tangent spaces as n-dimensional
affine subspaces. We can thus parallel translate the vectors to the origin of the ambient space, and do the necessary
comparison (such as take the dot product, etc.). We will make a similar abuse of notation for any calculation that uses
vectors from different affine subspaces to mean to first translate the vectors and then perform the necessary calculation.
14
q
q
p v
Tp M
Figure 6: Plane spanned by vectors q p and v Tp M (where v is the projection of q p onto Tp M ), with -balls
tangent to p. Note that q is the point on the ball such that pcq = pcq = .
Lemma 14 (approximating tangent space by closeby samples) Let 0 < 1. Pick any point p0 M
and let p1 , . . . , pn M be n points distinct from p0 such that (for all 1 i n)
(i) DG (p0 , pi ) / n,

p p
(ii) pi p0 j 0 1/2n (for i 6= j).
kpi p0 k
kpj p0 k
Let T be the n dimensional subspace spanned

T, let u be
by vectors {pi p0 }i[n] . For any unit vector u

u
kuk 1 .
the projection of u
onto Tp0 M . Then, u
p0
Proof: Define the vectors vi := kppii p
(for 1 i n). Observe that {
vi }i[n] forms a basis of T. For
0k
1 i n, define vi as the projection of vector vi onto Tp0 M . Also note that by applying Lemma 13, we
have that for all 1 i n, k
vi vi k2 2 /2n.
P
Let V = [
v1 , . . . , vn ] be the D n matrix. We represent the unit vector u
as V = i i vi . Also, since
P
u is the projection of u
, we have u = i i vi . Then, kk2 2. To see this, we first identify T with Rn
via an isometry S (a linear map that preserves the lengths and angles of all vectors in T). Note that S can be
represented as an n D matrix, and since V forms a basis for T, SV is an n n invertible matrix. Then,
1
since S u
= SV , we have = (SV ) S u
. Thus, (recall kS u
k = 1)
kk2
max k(SV )1 xk2 = max ((SV )T (SV )1 )
xS n1
max ((SV )1 (SV )T ) = max ((V T V )1 ) = 1/min (V T V )

1/1 ((n 1)/2n) 2,
where i) max (A) and min (A) denote the largest and smallest eigenvalues of a square symmetric matrix A
respectively, and ii) the second inequality is by noting that V T V is an n n matrix with 1s on the diagonal
and at most 1/2n on the off-diagonal elements, and applying the Gershgorin circle theorem.
Now we can bound the quantity of interest. Note that

X

u

uT (
u (
u u))| 1 k
u uk = 1
i (
v i v i )
u
|
kuk
i
X
X
|i | 1 ,
1
|i |k
vi vi k 1 (/ 2n)
i
where the last inequality is by noting kk1 2n.
A.2 On Constructing a Bounded Manifold Cover

Given a compact n-manifold M RD with condition number 1/ ,
and some 0 < 1. We can construct
10n+1
2n) as follows.
an -bounded (, )
cover
X
of
M
(with
2
and
/3
Set /3 2n and pick a (/2)-net C of M (that is C M such that, i. for c, c C such that
c 6= c , kc c k /2, ii. for all p M , exists c C such that kc pk < /2). WLOG we shall assume that
all points of C are in the interior of M . Then, for each c C, define Mc,/2 := {p M : kp ck /2},
and the orthogonal projection map fc : Mc,/2 Tc M that projects Mc,/2 onto Tc M (note that, cf. Lemma
15
11(i), fc is 1 1). Note that Tc M can be identified with Rn with the c as the origin. We will denote the origin
(c)
(c)
as x0 , that is, x0 = fc (c).
(c)
Now, let Bc be any n-dimensional closed ball centered at the origin x0 Tc M of radius r > 0 that is
(c)
(c)
completely contained in fc (Mc,/2 ) (that is, Bc fc (Mc,/2 )). Pick a set of n points x1 , . . . , xn on the
(c)
(c)
(c)
(c)
surface of the ball Bc such that (xi x0 ) (xj x0 ) = 0 for i 6= j.

Define the bounded manifold cover as
[
(c)
X :=
fc1 (xi ).
(6)
cC,i=0,...,n
Lemma 15 Let 0 < 1 and /3 2n. Let C be a (/2)-net of M as described above, and X be as
in Eq. (6). Then X forms a 210n+1 -bounded (, ) cover of M .
Proof: Pick any point p M and define Xp := {x X : kx pk < }. Let c C be such that
kp ck < /2. Then Xp has the following properties.
(c)
(c)
Covering criterion: For 0 i n, since kfc1 (xi ) ck /2 (by construction), we have kfc1 (xi )
(c)
(c)
(c)
pk < . Thus, fc1 (xi ) Xp (for 0 i n). Now, for 1 i n, noting that DG (fc1 (xi ), fc1 (x0 ))
(c)
(c)
f 1 (x
(c)
(c)
)f 1 (x
(c)
c
0
i
and
2kfc1 (xi ) fc1 (x0 )k (cf. Lemma 10), we have that for the vector vi := c1 (c)
(c)
kfc (xi )fc1 (x0 )k
(c)
(c)
(c)
x x
(c)
(c)
its (normalized) projection vi := i(c) 0(c) onto Tc M , vi vi / 2 (cf. Lemma 13). Thus,
kxi x0 k
(c)
for i 6= j, we have (recall, by construction, we have vi

(c)
|
vi
(c)
vj |
(c)
(c)
(c)
(c)
(c)
vj
= 0)
(c)
(c)
|(
vi
vi
|(
vi
vi ) (
vj vj ) + vi
k(
vi vi )kk(
vj vj )k + k
vi
3/ 2 1/2n.
(c)
(c)
(c)
(c)
+ vi ) (
vj vj + vj )|
(c)
(c)
(c)
(c)
(c)
(c)
(c)
(c)
(
vj vj ) + (
vi
(c)
(c)
(c)
(c)
(c)
vi ) vj |
(c)
vi k + k
vj vj k
(c)
Point representation criterion: There exists x Xp , namely fc1 (x0 ) (= c), such that kp xk /2.
(c)
Local boundedness criterion: Define Mp,3/2 := {q M : kq pk < 3/2}. Note that Xp {fc1 (xi ) :
c C Mp,3/2 , 0 i n}. Now, using Lemma 12 we have that exists a cover N Mp,3/2 of size
at most 93n such that for any point q Mp,3/2 , there exists n N such that kq nk < /4. Note that,
by construction of C, there cannot be an n N such that it is within distance /4 of two (or more) distinct
c, c C (since otherwise the distance kc c k will be less than /2, contradicting the packing of C). Thus,
|C Mp,3/2 | 93n . It follows that |Xp | (n + 1)93n 210n+1 .
(c)
Tangent space approximation criterion: Let Tp be the n-dimensional span of {
vi }i[n] (note that Tp may
not necessarily pass through p). Then, for any unit vector u
Tp , we need to show that its projection up
u
and up . Let uc be the
onto Tp M has the property |
u kupp k | 1 . Let be the angle between vectors u
projection of u
onto Tc M , and 1 be the angle between vectors u
and uc , and let 2 be the angle between
vectors uc (at c) and its parallel transport along the geodesic path to p. WLOG we can assume that 1 and 2
are at most /2. Then, 1 + 2 . We get the bound on the individual angles as follows. By applying
Lemma 14,
1 ) 1 /4, and by applying Lemma 9, cos(2 ) 1 /4. Finally, by using Lemma 16,
cos(
u
we have u
kupp k = cos() cos(1 + 2 ) 1 .
Lemma 16 Let 0 1 , 2 1. If cos 11 and cos 12 , then cos(+) 11 2 2 1 2 .
Proof: Applying the identity sin = 1 cos2 immediately yields

sin 21 and sin 22 .
Now, cos( + ) = cos cos sin sin (1 1 )(1 2 ) 2 1 2 1 1 2 2 1 2 .
Remark 6 A dense enough sample from M constitutes as a bounded cover. One can selectively prune the
dense sampling to control the total number of points in each neighborhood, while still maintaining the cover
properties.
16
A.3 Bounding the number of subsets K in Embedding I

By construction (see the preprocessing stage of Embedding I), K = maxxX |X B(x, 2)| (where B(x, r)
denotes a Euclidean ball centered at x of radius r). That is, K is the largest number of xs ( X) that are
within a 2 ball of some x X.
Now, pick any x X and consider the set Mx := M B(x, 2). Then, if /4, Mx can be covered
by 2cn balls of radius (see Lemma 12). By recalling that X forms an -bounded (, )-cover, we have
|X B(x, 2)| = |X Mx | 2cn (where c 4).
A.4 Proof of Lemma 4
Since R is a random orthoprojector from RD to Rd , it follows that
Lemma 17 (random projection of n-manifolds adapted from Theoremp1.5 of [Cla07]) Let M be a smooth
:= D/dR be a scaling of R. Pick
compact n-manifold with volume V and condition number 1/ . Let R

2
n
2
any 0 < 1 and 0 < 1. If d = log(V / ) + n log(1/) + ln(1/) , then with probability
at least 1 , for all p, q M
Rqk
(1 + )kp qk.
(1 )kp qk kRp
We apply this result with = 1/4. Then, for d = (log(V / n ) + n), with probability at least 1 1/poly(n),
D
d
(3/4)kp
p qk kRp Rqk (5/4)kp qk. Now let : R R be defined as x := (2/3)Rx =
(2/3)( D/d)x (as per the lemma statement). Then we immediately get (1/2)kp qk kp qk
(5/6)kp qk.
p
p
Also note that for any x RD , we have kxk = (2/3)( D/d)kRxk (2/3)( D/d)kxk (since R is
an orthoprojector).
Finally, for any point p M , a unit vector u tangent to M at p can be approximated arbitrarily well by
considering a sequence {pi }i of points (in M ) converging to p (in M ) such that (pi p)/kpi pk converges
to u. Since for all points pi , (1/2) kpi pk/kpi pk (5/6) (with high probability), it follows that
(1/2) k(D)p (u)k (5/6).
A.5 Proof of Corollary 5
Let vx1 and vxn ( Rn ) be the right singular vectors corresponding to singular values x1 and xn respectively
of the matrix Fx . Then, quickly note that x1 = kFx v 1 k, and xn = kFx v n k. Note that since Fx is
orthonormal, we have that kFx v 1 k = kFx v n k = 1. Now, since Fx v n is in the span of column vectors of
Fx , by the sampling condition (cf. Definition 2), there exists a unit length vector vxn tangent to M (at x)
such that |Fx vxn vxn | 1 . Thus, decomposing Fx vxn into two vectors anx and bnx such that anx bnx and
vxn , we have
anx := (Fx vxn vxn )
xn
k(Fx v n )k = k((Fx vxn vxn )

vxn ) + bnx k
(1 ) k
vxn k kbnx k
p
(1 )(1/2) (2/3) 2D/d,
p
p
since kbnx k2 = kFx vxn k2 kanx k2 1 (1 )2 2 and kbnx k (2/3)( D/d)kbnx k (2/3) 2D/d.
vx1 , we have
Similarly decomposing Fx vx1 into two vectors a1x and b1x such that a1x b1x and a1x := (Fx vx1 vx1 )
x1
k(Fx vx1 )k = k((Fx vx1 vx1 )

vx1 ) + b1x k
1

vx + kb1x k
p
(5/6) + (2/3) 2D/d,
where the last inequality is by noting kb1x k (2/3)

noting that d D, the corollary follows.
17
2D/d. Now, by our choice of ( d/32D), and by

We can simplify Eq. (1) by recalling how the subsets X (j) were constructed (see preprocessing stage of
Embedding I). Note that for any fixed t, at most one term in the set {(x) (t)}xX (j) is non-zero. Thus,
k(D)t (u)k
d
X
u2k +
n X
X
(x) (t) cos2 ((C x t)k )(C x u)2k + (x) (t) sin2 ((C x t)k )(C x u)2k
k=1 xX
k=1
q
2
2
1h
x
x
Ak,x
+ Ak,x
/ + 2Ak,x
cos (t)
sin (t) (x) (t) cos((C t)k )(C u)k
sin (t)
|
{z
} |
{z
}
1
2Ak,x
cos (t)
kuk2 +
(x) (t)
xX
n
X
(x) (t) sin((C t)k )(C u)k

{z
}
3
(C x u)2k + ,
k=1
k,x
n
where := (1 + 2 + 3 )/. Noting that i)pthe terms |Ak,x
d/)
sin (t)| and |Acos (t)| are at most O(9
(see Lemma 18), ii) |(C x u)k | 4, and iii) (x) (t) 1, we can pick sufficiently large (say,
(n2 9n d/) such that || /2 (where is the isometry constant from our main theorem).
k,x
n
Lemma 18 For all k, x and t, the terms |Ak,x
d/).
sin (t)| and |Acos (t)| are at most O(9
k,x
Proof: We shall focus on bounding |Ak,x
sin (t)| (the steps for bounding |Acos (t)| are similar). Note that
|Ak,x
sin (t)|
v
1/2
d
d
d d1/2 (t) 2
X
X
d1/2 (t) u
X
d
(t)
(x)

(x) u
(x)
t
x
ui sin((C t)k )
|ui |

,

i
i
dt
dt
dti
i=1
i=1
i=1
n
since kuk 1. Thus, we can bound |Ak,x
d/) by noting the following lemma.
sin (t)| by O(9
1/2
Lemma 19 For all i, x and t, |d(x) (t)/dti | O(9n /).
Proof: Pick any t (M ), and let p0 M be (the unique element) such that (p0 ) = t. Define Np0 :=
{x X : k(x) (p0 )k < } as the neighborhood around p0 . Fix an arbitrary x0 Np0 X (since if
1/2
x0
/ Np0 then d(x0 ) (t)/dti = 0), and consider the function
1/2
(x0 ) (t)
(x0 ) (t)
P
xNp (x) (t)
0
!1/2
e1/(1(kt(x0 )k / ))
P
1/(1(kt(x)k2 /2 ))
xNp e
0
18
!1/2
Pick an arbitrary coordinate i0 {1, . . . , d} and consider the (directional) derivative of this function
1/2
d(x0 ) (t)
dti0
1 1/2 d(x0 ) (t)
(t)
2 (x0 )
dti0
2(t (x ) )
1/2 X
X

0 i0
i0
At (x)
2
At (x0 )
e
eAt (x)
(A
(x
))
e
t
0
2
xNp0
xNp0
2
X
1/2

eAt (x)
2 eAt (x0 )
xNp0
X 2(t (x) )

i0
i0
2 At (x)
eAt (x0 )
(A
(x))
e
t
xNp0
2
X
At (x)
e
xNp0
2(t (x ) )
1/2

0 i0
i0
At (x0 )
2
eAt (x)
e
(A
(x
))
t
0
2
xNp0
1.5
X
2
eAt (x)
X
xNp0

1/2 X 2(t (x) )
i0
i0
2 At (x)
(A
(x))
e
eAt (x0 )
t
2
xNp0
,
1.5
X
2
eAt (x)

xNp0
where At (x) := 1/(1 (kt (x)k2 /2 )). Observe that the domain of At is {x X : kt (x)k < }
and the range is [1, ). Recalling that for any 1, | 2 e | 1 and | 2 e/2 | 3, we have that
|At ()2 eAt () | 1 and |At ()2 eAt ()/2 | 3. Thus,
d1/2 (t)
(x0 )

dti0
X 2(t (x) )
2(t (x ) )
X

i0

i0
i0
0 i0
3
eAt (x)
+ eAt (x0 )/2

2
2
xNp0
xNp0
1.5
X
2
eAt (x)
xNp
0
X

At (x0 )/2 X
At (x)
(3)(2/)
e
(2/)
+ e

xNp0
xNp0
1.5
X
2
eAt (x)
xNp0
O(9 /),
where the last inequality is by noting: i) |Np0 | 9n (since for all x Np0 , kx p0 k 2 cf. Lemma
4, X is an -bounded cover, and by noting that for /4, a ball of radius 2 can be covered by 9n balls
of
on the given n-manifold cf. Lemma 12), ii) |eAt (x) | |eAt (x)/2 | 1 (for all x), and iii)
P radius A
t (x)
(1) (since our cover X ensures that for any p0 , there exists x Np0 X such that
xNp e
0
kp0 xk /2 see also Remark 2, and hence eAt (x) is non-negligible for some x Np0 ).
19

Note that by definition, k(D)t (u)k2 = k(D|X|,n )t (u)k2 . Thus, using Eq. (3) and expanding the recursion, we have
k(D)t (u)k2
=
=
k(D|X|,n )t (u)k2
k(D|X|,n1 )t (u)k2 + (x|X| ) (t)(C x|X| u)2n + Z|X|,n

..
.
k(D0,n )t (u)k +
|X|
hX
(xi ) (t)
n
X
j=1
i=1
i X
Zi,j .
(C xi u)2j +
i,j
2
2
Note that (Di,0 )t (u) := (Di1,n )t (u).
P Now recalling that k(D0,n )t (u)k = kuk (the base case of the
recursion), all we need to show is that | i,j Zi,j | /2. This follows directly from the lemma below.

Lemma 20 Let 0 O /d(n|X|)2 , and for any i, j, let i,j (Ki,j + (9n /))(nd|X|)2 / (as per
the statement of Lemma 7). Then, for any i, j, |Zi,j | /2n|X|.
Proof: Recall that (cf. Eq. (3))

Zi,j
k,4
X i,j
X k,1 k,2
X k,1 k,3
1 X k,4 2
k,1
k,2
k,3
+2
i,j i,j + 2
i,j i,j .
i,j
+ i,j
+ i,j
i,j + 2
= 2
i,j
i,j
k
k
k
k
{z
} |
{z
} |
{z
} |
{z
}
|
(a)
(b)
(c)
(d)

2
k,4 2
Term (a): Note that | k (i,j
) | O d3 (Ki,j + (9n /)) (cf. Lemma 21 (iv)). By our choice of i,j ,
we have term (a) at most O(/n|X|).
k,1

k,2
k,3
Term (b): Note that i,j
+ i,j
+ i,j
O(n|X| + (/dn|X|)) (by noting Lemma 21 (i)-(iii), recall P k,4 k,1

k,2
k,3
ing the choice of i,j , and summing over all i , j ). Thus, k i,j
(i,j + i,j
+ i,j
) O d2 (Ki,j +

(9n /)) n|X| + (/dn|X|) . Again, by our choice of i,j , term (b) is at most O(/n|X|).
Terms (c) and (d): We focus on bounding term (c) (the steps for
bounding term (d) are same). Note that
P k,1 k,2
P k,1
k,1
| k i,j
i,j | 4| k i,j
(i,j (t))k |. Now, observe that i,j
is a tangent vector with length
k=1,...,2d+3
at most O(dn|X| + (/dn|X|)) (cf. Lemma 21 (i)). Thus, by noting that i,j is almost normal (with quality
of approximation 0 ), we have term (c) at most O(/n|X|).
By choosing the constants in the order terms appropriately, we can get the lemma.
k,1
k,2
k,3
k,4
Lemma 21 Let i,j
, i,j
, i,j
, and i,j
be as defined in Eq. (3). Then for all 1 i |X| and 1 j n,
we have
Pi Pj1
k,1
(i) |i,j
| 1 + 8n|X| + i =1 j =1 O(d(Ki ,j + (9n /))/i ,j ),
k,2
(ii) |i,j
| 4,
k,3
(iii) |i,j
| 4,
k,4
(iv) |i,j
| O(d(Ki,j + (9n /))).
Proof: First note for any kuk 1 and for any xi X, 1 j n and 1 l d, we have |
|(C xi u)j | 4 (cf. Lemma 23 (b) and Corollary 5).
xi
Cj,l
ul | =
2,k
3,k
Noting that for all i and j, ki,j k = ki,j k = 1, we have |i,j
| 4 and |i,j
| 4.
P
k,4
k,l
Observe that i,j
= l ul Ri,j
. For all i, j, k and l, note that i) kdi,j (t)/dtl k Ki,j and kdi,j (t)/dtl k
1/2
k,4
Ki,j and ii) |d(xi ) (t)/dtl | O(9n /) (cf. Lemma 19). Thus we have |i,j
| O(d(Ki,j + (9n /))).
P
k,1
k,1
l
Now for any i, j, note that i,j
=
l ul di,j1 (t)/dt . Thus by recursively expanding, |i,j | 1 +
Pi Pj1
8n|X| + i =1 j =1 O(d(Ki ,j + (9n /))/i ,j ).
20

We start by stating the following useful observations:
Lemma 22 Let A be a linear operator such that maxkxk=1 kAxk max . Let u be a unit-length vector. If
kAuk min > 0, then for any unit-length vector v such that |u v| 1 , we have
kAvk
max 2
max 2
1+
.
1
min
kAuk
min
Proof: Let v = v if u v > 0, otherwise let v = v. Quickly note that ku v k2 = kuk2 + kv k2 2u v =
2(1 u v ) 2. Thus, we have,
i. kAvk = kAv k kAuk + kA(u v )k kAuk + max 2,
ii. kAvk = kAv k kAuk kA(u v )k kAuk max 2.

Noting that kAuk min yields the result.
Lemma 23 Let x1 , . . . , xn RD be a set of orthonormal vectors, F := [x1 , . . . , xn ] be a D n matrix
and let be a linear map from RD to Rd (n d D) such that for all non-zero a span(F ) we have
0 < kak kak. Let U V T be the thin SVD of F . Define C = (2 I)1/2 U T . Then,
(a) kC(a)k2 = kak2 kak2 , for any a span(F ),
(b) kCk2 (1/ n )2 , where k k denotes the spectral norm of a matrix and n is the nth largest singular
value of F .
Proof: Note that F V forms an orthonormal basis for the subspace spanned by columns of F that maps to
U via the mapping . Thus, since a span(F ), let y be such that a = F V y. Note that i) kak2 = kyk2 , ii)
kak2 = kU yk2 = y T 2 y. Now,
kCak2
=
=
=
=
=
Now, consider kCk2 .

kCk2
k(2 I)1/2 U T U V T V yk2
k(2 I)1/2 yk2
y T y y T 2 y
kak2 kak2 .
k(2 I)1/2 k2 kU T k2
max k(2 I)1/2 xk2
kxk=1
kxk=1
k((2 I)1/2 U T )F V yk2
max xT 2 x
X
max
x2i /( i )2
kxk=1
i
n 2
(1/ ) ,
where are the (top n) singular values forming the diagonal matrix .
Lemma 24 Let M RD be a compact Riemannian n-manifold with condition number 1/ . Pick any
x M and let Fx be any n-dimensional
affine space with the property: for any unit vector vx tangent to M

xF
at x, and its projection vxF onto Fx , vx kvvxF
k 1 . Then for any p M such that kx pk /2,
p
and any unit vector v tangent to M at p, ( := (2/ ) + + 2 2/ )

i. v kvvFF k 1 ,
ii. kvF k2 1 2,
21
iii. kvr k2 2,
where vF is the projection of v onto Fx and vr is the residual (i.e. v = vF + vr and vF vr ).
Proof: Let be the angle between vF and v. We will bound this angle.
Let vx (at x) be the parallel transport of v (at p) via the (shortest) geodesic path via the manifold connection. Let the angle between vectors v and vx be . Let vxF be the projection of vx onto the subspace Fx ,
and let the angle between vx and vxF be . WLOG,
we can assume that the angles and are acute. Then,

since + , we have that v kvvFF k = cos cos( + ). We bound the individual terms cos
and cos as follows.
Now, since kp
xk , using Lemmas 9 and 10, we have cos() = |v vx | 1 2/ . We also

have cos() = vx kvvxF
1 . Then, using Lemma 16, we finally get v kvvFF k = | cos()|
xF k
p
1 2/ 2 2/ = 1 .

2

2

Also note since 1 = kvk2 = (v kvvFF k )2 kvvFF k + kvr k2 , we have kvr k2 = 1 v kvvFF k 2, and
kvF k2 = 1 kvr k2 1 2.
Now we are in a position to prove Lemma 8. Let vF be the projection of the unit vector v (at p) onto the
subspace spanned by (the columns of) Fx and vr be the residual (i.e. v = vF + vr and vF vr ). Then, noting
that p, x, v and Fx satisfy the conditions of Lemma 24p
(with in the Lemma 24 replaced with 2 from the
statement of Lemma 8), we have ( := (4/ ) + + 4 / )

a) v vF 1 ,
kvF k
b) kvF k2 1 2,
c) kvr k2 2.
We can now bound the required quantity kC x uk2 . Note that

kC x uk2 = kC x vk2 = kC x (vF + vr )k2
=
kC x vF k2 + kC x vr k2 + 2C x vF C x vr
kvF k2 kvF k2 + kC x vr k2 + 2C x vF C x vr
{z
}
{z
} | {z } |
|
(a)
(b)
(c)
where the last equality is by observing vF is in the span of Fx and applying Lemma 23 (a). We now bound
the terms (a),(b), and (c) individually.
2
Term (a): Note that
the conditions of Lemma 22
p 1 2 kvF k 1 and observing that satisfies

with max = (2/3) D/d, min = (1/2) kvk (cf. Lemma 4) and v kvvFF k 1 , we have (recall
kvk = kuk 1)

v F 2

kvF k2 kvF k2 1 kvF k2

kvF k

v F 2

1 (1 2)
kvF k

v F 2

1 + 2
kvF k
p
2
2
1 + 2 1 (4/3) 2D/d kvk
p

(7)
1 kuk2 + 2 + (8/3) 2D/d ,
where the fourth inequality is by using Lemma 22. Similarly, in the other direction

v F 2
2
2
2

kvF k kvF k 1 2 kvF k
kvF k

v F 2

1 2
kvF k
p
2
2
1 2 1 + (4/3) 2D/d kvk
p

(8)
1 kuk2 2 + (32/9)(D/d) + (8/3) 2D/d .
22
N = Rd
x4
x1
x2
x3
R2d+3
Figure 7: Basic setup for computing the normals to the underlying n-manifold M at the point of interest p. Observe
that even though it is difficult to find vectors normal to M at p within the containing space Rd (because we only have
a finite-size sample from M , viz. x1 , x2 , etc.), we can treat the point p as part of the bigger ambient manifold N
(= Rd , that contains M ) and compute the desired normals in a space that contains N itself. Now, for each i, j iteration
of Algorithm II, i,j acts on the entire N , and since we have complete knowledge about N , we can compute the desired
normals.
p
Term (b): Note that for any x, kxk (2/3)( D/d)kxk. We can apply Lemma 23 (b) with xn 1/4 (cf.
Corollary 5) and noting that kvr k2 2, we immediately get
0 kC x vr k2 42 (4/9)(D/d)kvr k2 (128/9)(D/d).
(9)
p
Term (c): Recall that for any x, kxk (2/3)( D/d)kxk, and using Lemma 23 (b) we have that kC x k2
16 (since xn 1/4 cf. Corollary 5).
Now let a := p
C x vF and b := C x vr . Then kak = kC x vF k kC x kkvF k 4, and kbk =
x
kC vr k (8/3) 2D/d (see Eq. (9)).
p
p
Thus, |2a b| 2kakkbk 2 4 (8/3) 2D/d = (64/3) 2D/d. Equivalently,
(64/3)
2D/d 2C x vF C x vr (64/3)
Combining (7)-(10), and noting d D, yields the lemma.
2D/d.
(10)
A.9 Computing the Normal Vectors

The success of the second embedding technique crucially depends upon finding (at each iteration step) a pair
of mutually orthogonal unit vectors that are normal to the embedding of manifold M (from the previous
iteration step) at a given point p. At a first glance finding such normal vectors seems infeasible since we
only have access to a finite size sample X from M . The saving grace comes from noting that the corrections
are applied to the n-dimensional manifold (M ) that is actually a submanifold of d-dimensional space Rd .
Lets denote this space Rd as a flat d-manifold N (containing our manifold of interest (M )). Note that
even though we only have partial information about (M ) (since we only have samples from it), we have
full information about N (since it is the entire space Rd ). What it means is that given some point of interest
p (M ) N , finding a vector normal to N (at p) automatically is a vector normal to (M ) (at p).
Of course, to find two mutually orthogonal normals to a d-manifold N , N itself needs to be embedded in a
larger dimensional Euclidean space (although embedding into d + 2 should suffice, for computational reasons
we will embed N into Euclidean space of dimension 2d + 3). This is precisely the first thing we do before
applying any corrections (cf. Step 2 of Embedding II in Section 4). See Figure 7 for an illustration of the
setup before finding any normals.
Now for every iteration of the algorithm, note that we have complete knowledge of N and exactly what
function (namely i,j for iteration i, j) is being applied to N . Thus with additional computation effort, one
can compute the necessary normal vectors.
More specifically, We can estimate a pair of mutually orthogonal unit vectors that are normal to i,j (N )
at p (for any step i, j) as follows.
23
Algorithm 4 Compute Normal Vectors

Preprocessing Stage:
rand
rand
1: Let i,j
and i,j
be vectors in R2d+3 drawn independently at random from the surface of the unit-sphere
(for 1 i |X|, 1 j n).
Compute Normals: For any point of interest p M , let t := p denote its projection into Rd . Now, for any
iteration i, j (where 1 i |X|, and 1 j n), we shall assume that vectors and upto iterations i, j 1
are already given. Then we can compute the (approximated) normals i,j (t) and i,j (t) for the iteration i, j
as follows.
1: Let > 0 be the quality of approximation.
2: for k = 1, . . . , d do
3:
Approximate the k th tangent vector as
T k :=
4:
5:
6:
7:
8:
i,j1 (t + ek ) i,j1 (t)

,
where i,j1 is as defined in Section 5.3, and ek is the k th standard vector.

end for
rand
rand
Let = i,j
, and = i,j
.
Use Gram-Schmidt orthogonalization process to extract (from ) that is orthogonal to vectors
{T 1 , . . . , T d }.
Use Gram-Schmidt orthogonalization process to extract (from ) that is orthogonal to vectors
{T 1 , . . . , T d , }.
return /k
k and /k
k as mutually orthogonal unit vectors that are approximately normal to
i,j1 (M ) at i,j1 (t).
A few remarks are in order.

Remark 7 The choice of target dimension of size 2d + 3 (instead of d + 2) ensures that a pair of random
unit-vectors and are not parallel to any vector in the tangent bundle of i,j1 (N ) with probability 1. This
again follows from Sards theorem, and is the key observation in reducing the embedding size in Whitneys
embedding [Whi36]. This also ensures that our orthogonalization process (Steps 6 and 7) will not result in a
null vector.
Remark 8 By picking sufficiently small, we can approximate the normals and arbitrarily well by
approximating the tangents T 1 , . . . , T d well.
Remark 9 For each iteration i, j, the vectors /k
k and /k
k that are returned (in Step 8) are a smooth
rand
rand
respectively. Now, since we use the same starting vectors
and i,j
modification to the starting vectors i,j
rand
rand
i,j
and i,j
regardless of the point of application (t = p), it follows that the respective directional
derivates of the returned vectors are bounded as well.
By noting Remarks 8 and 9, the approximate normals we return satisfy the conditions needed for Embedding II (see our discussion in Section 5.3).
24

Towards An Algorithmic Realization of Nash's Embedding Theorem

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Towards An Algorithmic Realization of Nash's Embedding Theorem

Caricato da

Copyright:

Formati disponibili

Towards an Algorithmic Realization of Nashs Embedding Theorem

A C k -embedding of a smooth manifold M is an embedding of M that has k continuous derivatives.

Nashs Construction for C 1 -Isometric Embedding

x0 k kxj x0 k 1/2n, for i 6= j. (covering criterion)

exists point x Xp such that kx pk /2. (point representation criterion)

Inputs. We assume the following quantities are given

Algorithm 2 Embedding Technique I

3: return (t) as the embedding of p in R

Algorithm 3 Embedding Technique II

i,j (t) := i,j1 (t) + i,j (t)

sin(i,j (C t)j ) + i,j (t)

A few remarks are in order.

be the embedding of M returned by Algorithm I (where c 4),

be the embedding of M returned by Algorithm II.

ii. (1 )DG (p, q) DG (pII , qII ) (1 + )DG (p, q).

(b) For all p, q M , (1/2)kp qk kp qk (5/6)kp qk.

k(D)t (u)k kvk

(c) For all x RD , kxk (2/3)(

stretches the length of v from 1 to (1, C cos(Ct), C sin(Ct))|t=0 = 1 + C 2 . NoThus, applying

wi (D)(p) (e )k , where (D)(p) (e ) = i

dividual components are given by

By algebra, we see that

(p) ((D)p (v))k2 = k(D)

cos2 ((C(p))k )((Cv)k )2 +

sin2 ((C(p))k )((Cv)k )2

((Cv)k )2 = kvk2 + kCvk2 = kvk2 + (v)T C T C(v)

can exactly restore the contraction caused by for any

ul el (such that kuk2 1) in terms of its basis

with the k th component given as

(xi ) (t) sin(i,j (C xi t)j )

= k(Di,j1 )t (u)k2 + (xi ) (t)(C xi u)2j

= 0 since and are

Combined Effect of ((M ))

(1 )kvk2 k(D( ))p (v)k (1 + )kvk2 .

Preservation of the Geodesics

One can modify A to produce a contraction by simple scaling.

Lemma 10 (relating geodesic distances to ambient

Let T be the n dimensional subspace spanned

max k(SV )1 xk2 = max ((SV )T (SV )1 )

max ((SV )1 (SV )T ) = max ((V T V )1 ) = 1/min (V T V )

where the last inequality is by noting kk1 2n.

A.2 On Constructing a Bounded Manifold Cover

surface of the ball Bc such that (xi x0 ) (xj x0 ) = 0 for i 6= j.

for i 6= j, we have (recall, by construction, we have vi

Lemma 16 Let 0 1 , 2 1. If cos 11 and cos 12 , then cos(+) 11 2 2 1 2 .

Proof: Applying the identity sin = 1 cos2 immediately yields

A.3 Bounding the number of subsets K in Embedding I

k(Fx v n )k = k((Fx vxn vxn )

k(Fx vx1 )k = k((Fx vx1 vx1 )

where the last inequality is by noting kb1x k (2/3)

2D/d. Now, by our choice of ( d/32D), and by

A.6 Proof of Lemma 6

(x) (t) sin((C t)k )(C u)k

Lemma 19 For all i, x and t, |d(x) (t)/dti | O(9n /).

1 1/2  d(x0 ) (t) 

A.7 Proof of Lemma 7

k(D|X|,n1 )t (u)k2 + (x|X| ) (t)(C x|X| u)2n + Z|X|,n

Proof: Recall that (cf. Eq. (3))

A.8 Proof of Lemma 8

i. kAvk = kAv k kAuk + kA(u v )k kAuk + max 2,

ii. kAvk = kAv k kAuk kA(u v )k kAuk max 2.

Now, consider kCk2 .

k(2 I)1/2 U T U V T V yk2

1 1/2 d(x0 ) (t)