Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
by the PlanetMath authors Aatu, ack, akrowne, alek thiery, alinabi, almann,
alozano, antizeus, antonio, aparna, ariels, armbrusterb, AxelBoldt, basseykay,
bbukh, benjaminfjones, bhaire, brianbirgen, bs, bshanks, bwebste, cryo, danielm,
Daume, debosberg, deiudi, digitalis, djao, Dr Absentius, draisma, drini, drum-
mond, dublisk, Evandar, fibonaci, flynnheiss, gabor sz, GaloisRadical, gantsich,
gaurminirick, gholmes74, giri, greg, grouprly, gumau, Gunnar, Henry, iddo, igor,
imran, jamika chris, jarino, jay, jgade, jihemme, Johan, karteef, karthik, kemy-
ers3, Kevin OBryant, kidburla2003, KimJ, Koro, lha, lieven, livetoad, liyang, Lo-
gan, Luci, m759, mathcam, mathwizard, matte, mclase, mhale, mike, mikestaflo-
gan, mps, msihl, muqabala, n3o, nerdy2, nobody, npolys, Oblomov, ottocolori,
paolini, patrickwonders, pbruin, petervr, PhysBrain, quadrate, quincynoodles,
ratboy, RevBobo, Riemann, rmilson, ruiyang, Sabean, saforres, saki, say 10,
scanez, scineram, seonyoung, slash, sleske, slider142, sprocketboy, sucrose, super-
higgs, tensorking, thedagit, Thomas Heye, thouis, Timmy, tobix, tromp, tz26, un-
lord, uriw, urz, vampyr, vernondalhart, vitriol, vladm, volator, vypertd, wberry,
Wkbj79, wombat, x bas, xiaoyanggu, XJamRastafire, xriso, yark et al.
Welcome to the PlanetMath “One Big Book” compilation, the Free Encyclopedia of Math-
ematics. This book gathers in a single document the best of the hundreds of authors and
thousands of other contributors from the PlanetMath.org web site, as of January 4, 2004.
The purpose of this compilation is to help the efforts of these people reach a wider audience
and allow the benefits of their work to be accessed in a greater breadth of situations.
We want to emphasize is that the Free Encyclopedia of Mathematics will always be a work
in progress. Producing a book-format encycopedia from the amorphous web of interlinked
and multidimensionally-organized entries on PlanetMath is not easy. The print medium
demands a linear presentation, and to boil the web site down into this format is a difficult,
and in some ways lossy, transformation. A major part of our editorial efforts are going into
making this transformation. We hope the organization we’ve chosen for now is useful to
readers, and in future editions you can expect continuing improvements.
The “linearization” of PlanetMath.org is not the only editorial task we must perform.
Throughout the millenia, readers have come to expect a strict standard of consistency and
correctness from print books, and we must strive to meet this standard in the PlanetMath
Book as closely as possible. This means applying more editorial control to the book form
of PlanetMath than is applied to the web site. We hope you will agree that there is signifi-
cant value to be gained from unifying style, correcting errors, and filtering out not-yet-ready
content, so we will continue to do these things.
For more details on planned improvements to this book, see the TODO file that came with
this archive. Remember that you can help us to improve this work by joining PlanetMath.org
and filing corrections, adding entries, or just participating in the community. We are also
looking for volunteers to help edit this book, or help with programming related to its pro-
duction, or to help work on Noosphere, the PlanetMath software. To send us comments
about the book, use the e-mail address pmbook@planetmath.org. For general comments
and queries, use feedback@planetmath.org.
Happy mathing,
Joe Corneli
Aaron Krowne
Tuesday, January 27, 2004
i
Top-level Math Subject
Classificiations
00 General
01 History and biography
03 Mathematical logic and foundations
05 Combinatorics
06 Order, lattices, ordered algebraic structures
08 General algebraic systems
11 Number theory
12 Field theory and polynomials
13 Commutative rings and algebras
14 Algebraic geometry
15 Linear and multilinear algebra; matrix theory
16 Associative rings and algebras
17 Nonassociative rings and algebras
18 Category theory; homological algebra
19 $K$-theory
20 Group theory and generalizations
22 Topological groups, Lie groups
26 Real functions
28 Measure and integration
30 Functions of a complex variable
31 Potential theory
32 Several complex variables and analytic spaces
33 Special functions
34 Ordinary differential equations
35 Partial differential equations
37 Dynamical systems and ergodic theory
39 Difference and functional equations
40 Sequences, series, summability
41 Approximations and expansions
42 Fourier analysis
43 Abstract harmonic analysis
44 Integral transforms, operational calculus
ii
45 Integral equations
46 Functional analysis
47 Operator theory
49 Calculus of variations and optimal control; optimization
51 Geometry
52 Convex and discrete geometry
53 Differential geometry
54 General topology
55 Algebraic topology
57 Manifolds and cell complexes
58 Global analysis, analysis on manifolds
60 Probability theory and stochastic processes
62 Statistics
65 Numerical analysis
68 Computer science
70 Mechanics of particles and systems
74 Mechanics of deformable solids
76 Fluid mechanics
78 Optics, electromagnetic theory
80 Classical thermodynamics, heat transfer
81 Quantum theory
82 Statistical mechanics, structure of matter
83 Relativity and gravitational theory
85 Astronomy and astrophysics
86 Geophysics
90 Operations research, mathematical programming
91 Game theory, economics, social and behavioral sciences
92 Biology and other natural sciences
93 Systems theory; control
94 Information and communication, circuits
97 Mathematics education
iii
Table of Contents lipschitz function 21
lognormal random variable 21
Introduction i lowest upper bound 22
Top-level Math Subject Classificiations ii marginal distribution 22
Table of Contents iv measurable space 23
GNU Free Documentation License lii measure zero 23
UNCLA – Unclassified 1 minimum spanning tree 23
Golomb ruler 1 minimum weighted path length 24
Hesse configuration 1 mod 2 intersection number 25
Jordan’s Inequality 2 moment generating function 27
Lagrange’s theorem 2 monoid 27
Laurent series 3 monotonic operator 27
Lebesgue measure 3 multidimensional Gaussian integral 28
Leray spectral sequence 4 multiindex 29
Möbius transformation 4 near operators 30
Mordell-Weil theorem 4 negative binomial random variable 36
Plateau’s Problem 5 normal random variable 37
Poisson random variable 5 normalizer of a subset of a group 38
Shannon’s theorem 6 nth root 38
Shapiro inequality 9 null tree 40
Sylow p-subgroups 9 open ball 40
Tchirnhaus transformations 9 opposite ring 40
Wallis formulae 10 orbit-stabilizer theorem 41
ascending chain condition 10 orthogonal 41
bounded 10 permutation group on a set 41
bounded operator 11 prime element 42
complex projective line 12 product measure 43
converges uniformly 12 projective line 43
descending chain condition 13 projective plane 43
diamond theorem 13 proof of calculus theorem used in the Lagrange
equivalently oriented bases 13 method 44
finitely generated R-module 14 proof of orbit-stabilizer theorem 45
fraction 14 proof of power rule 45
group of covering transformations 15 proof of primitive element theorem 47
idempotent 15 proof of product rule 47
isolated 17 proof of sum rule 48
isolated singularity 17 proof that countable unions are countable 48
isomorphic groups 17 quadrature 48
joint continuous density function 18 quotient module 49
joint cumulative distribution function 18 regular expression 49
joint discrete density function 19 regular language 50
left function notation 20 right function notation 51
lift of a submanifold 20 ring homomorphism 51
limit of a real function exits at a point 20 scalar 51
schrodinger operator 51
iv
selection sort 52 00A20 – Dictionaries and other general
semiring 53 reference works 78
simple function 54 completing the square 78
simple path 54 00A99 – Miscellaneous topics 80
solutions of an equation 54 QED 80
spanning tree 54 TFAE 80
square root 55 WLOG 81
stable sorting algorithm 56 order of operations 81
standard deviation 56 01A20 – Greek, Roman 84
stochastic independence 56 Roman numerals 84
substring 57 01A55 – 19th century 85
successor 57 Poincar, Jules Henri 85
sum rule 58 01A60 – 20th century 90
superset 58 Bourbaki, Nicolas 90
symmetric polynomial 59 Erds Number 97
the argument principle 59 03-00 – General reference works (hand-
torsion-free module 59 books, dictionaries, bibliographies, etc.) 98
total order 60 Burali-Forti paradox 98
tree traversals 60 Cantor’s paradox 98
trie 63 Russell’s paradox 99
unit vector 64 biconditional 99
unstable fixed point 65 bijection 100
weak* convergence in normed linear space 65 cartesian product 100
well-ordering principle for natural numbers 65 chain 100
00-01 – Instructional exposition (textbooks, characteristic function 101
tutorial papers, etc.) 66 concentric circles 101
dimension 66 conjunction 102
toy theorem 67 disjoint 102
00-XX – General 68 empty set 102
method of exhaustion 68 even number 103
00A05 – General mathematics 69 fixed point 103
Conway’s chained arrow notation 69 infinite 103
Knuth’s up arrow notation 70 injective function 104
arithmetic progression 70 integer 104
arity 71 inverse function 105
introducing 0th power 71 linearly ordered 106
lemma 71 operator 106
property 72 ordered pair 106
saddle point approximation 72 ordering relation 106
singleton 73 partition 107
subsequence 73 pullback 107
surreal number 73 set closed under an operation 108
00A07 – Problem books 76 signature of a permutation 109
Nesbitt’s inequality 76 subset 109
proof of Nesbitt’s inequality 76 surjective 110
v
transposition 110 quantifier 141
truth table 111 quantifier free 144
03-XX – Mathematical logic and founda- subformula 144
tions 112 syntactic compactness theorem for first order logic
standard enumeration 112 144
03B05 – Classical propositional logic 113 transfinite induction 144
CNF 113 universal relation 145
Proof that contrapositive statement is true using universal relations exist for each level of the arith-
logical equivalence 113 metical hierarchy 145
contrapositive 114 well-founded induction 146
disjunction 114 well-founded induction on formulas 147
equivalent 114 03B15 – Higher-order logic and type the-
implication 115 ory 143
propositional logic 115 Härtig’s quantifier 143
theory 116 Russell’s theory of types 143
transitive 116 analytic hierarchy 145
truth function 117 game-theoretical quantifier 146
03B10 – Classical first-order logic 118 logical language 147
∆1 bootstrapping 118 second order logic 148
Boolean 119 03B40 – Combinatory logic and lambda-
Gödel numbering 120 calculus 150
Gödel’s incompleteness theorems 120 Church integer 150
Lindenbaum algebra 127 combinatory logic 150
Lindström’s theorem 128 lambda calculus 151
Pressburger arithmetic 129 03B48 – Probability and inductive logic
R-minimal element 129 154
Skolemization 129 conditional probability 154
arithmetical hierarchy 129 03B99 – Miscellaneous 155
arithmetical hierarchy is a proper hierarchy 130 Beth property 155
atomic formula 131 Hofstadter’s MIU system 155
creating an infinite model 131 IF-logic 157
criterion for consistency of sets of formulas 132 Tarski’s result on the undefinability of Truth 160
deductions are ∆1 132 axiom 161
example of Gödel numbering 134 compactness 164
example of well-founded induction 135 consistent 164
first order language 136 interpolation property 164
first order logic 137 sentence 165
first order theories 138 03Bxx – General logic 166
free and bound variables 138 Banach-Tarski paradox 166
generalized quantifier 139 03C05 – Equational classes, universal al-
logic 140 gebra 168
proof of compactness theorem for first order logic congruence 168
141 every congruence is the kernel of a homomor-
proof of principle of transfinite induction 141 phism 168
proof of the well-founded induction principle 141 homomorphic image of a Σ-structure is a Σ-structure
vi
169 infinitesimal 193
kernel 169 o-minimality 194
kernel of a homomorphism is a congruence 169 real closed fields 194
quotient structure 170 03C68 – Other classical first-order model
03C07 – Basic properties of first-order lan- theory 196
guages and structures 171 imaginaries 196
Models constructed from constants 171 03C90 – Nonclassical models (Boolean-valued,
Stone space 172 sheaf, etc.) 198
alphabet 173 Boolean valued model 198
axiomatizable theory 174 03C99 – Miscellaneous 199
definable 174 axiom of foundation 199
definable type 175 elementarily equivalent 199
downward Lowenheim-Skolem theorem 176 elementary embedding 200
example of definable type 176 model 200
example of strongly minimal 177 proof equivalence of formulation of foundation
first isomorphism theorem 177 201
language 178 03D10 – Turing machines and related no-
length of a string 179 tions 203
proof of homomorphic image of a Σ-structure is Turing machine 203
a Σ-structure 179 03D20 – Recursive functions and relations,
satisfaction relation 180 subrecursive hierarchies 206
signature 181 primitive recursive 206
strongly minimal 181 03D25 – Recursively (computably) enu-
structure preserving mappings 181 merable sets and degrees 207
structures 182 recursively enumerable 207
substructure 183 03D75 – Abstract and axiomatic computabil-
type 183 ity and recursion theory 208
upward Lowenheim-Skolem theorem 183 Ackermann function 208
03C15 – Denumerable structures 185 halting problem 209
random graph (infinite) 185 03E04 – Ordered sets and their cofinali-
03C35 – Categoricity and completeness of ties; pcf theory 211
theories 187 another definition of cofinality 211
κ-categorical 187 cofinality 211
Vaught’s test 187 maximal element 212
proof of Vaught’s test 187 partitions less than cofinality 213
03C50 – Models with special properties well ordered set 213
(saturated, rigid, etc.) 189 pigeonhole principle 213
example of universal structure 189 proof of pigeonhole principle 213
homogeneous 191 tree (set theoretic) 214
universal structure 191 κ-complete 215
03C52 – Properties of classes of models Cantor’s diagonal argument 215
192 Fodor’s lemma 216
amalgamation property 192 Schroeder-Bernstein theorem 216
03C64 – Model theory of ordered struc- Veblen function 216
tures; o-minimality 193 additively indecomposable, 217
vii
cardinal number 217 fibre 235
cardinal successor 217 filtration 236
cardinality 218 finite character 236
cardinality of a countable union 218 fix (transformation actions) 236
cardinality of the rationals 219 function 237
classes of ordinals and enumerating functions 219 functional 237
club 219 generalized cartesian product 238
club filter 220 graph 238
countable 220 identity map 238
countably infinite 221 inclusion mapping 239
finite 221 inductive set 239
fixed points of normal functions 221 invariant 240
height of an algebraic number 221 inverse function theorem 240
if A is infinite and B is a finite subset of A, then inverse image 241
A \ B is infinite 222 mapping 242
limit cardinal 222 mapping of period n is a bijection 242
natural number 223 partial function 242
ordinal arithmetic 224 partial mapping 243
ordinal number 225 period of mapping 243
power set 225 pi-system 244
proof of Fodor’s lemma 225 proof of inverse function theorem 244
proof of Schroeder-Bernstein theorem 225 proper subset 246
proof of fixed points of normal functions 226 range 246
proof of the existence of transcendental numbers reflexive 246
226 relation 246
proof of theorems in aditively indecomposable restriction of a mapping 247
227 set difference 247
proof that the rationals are countable 228 symmetric 247
stationary set 228 symmetric difference 248
successor cardinal 229 the inverse image commutes with set operations
uncountable 229 248
von Neumann integer 229 transformation 249
von Neumann ordinal 230 transitive 250
weakly compact cardinal 231 transitive 250
weakly compact cardinals and the tree property transitive closure 250
231 Hausdorff’s maximum principle 250
Cantor’s theorem 232 Kuratowski’s lemma 251
proof of Cantor’s theorem 232 Tukey’s lemma 251
additive 232 Zermelo’s postulate 251
antisymmetric 233 Zermelo’s well-ordering theorem 251
constant function 233 Zorn’s lemma 252
direct image 234 axiom of choice 252
domain 234 equivalence of Hausdorff’s maximum principle,
dynkin system 234 Zorn’s lemma and the well-ordering theorem 252
equivalence class 235 equivalence of Zorn’s lemma and the axiom of
viii
choice 253 Martin’s axiom 283
maximality principle 254 Martin’s axiom and the continuum hypothesis
principle of finite induction 254 283
principle of finite induction proven from well- Martin’s axiom is consistent 284
ordering principle 255 a shorter proof: Martin’s axiom and the contin-
proof of Tukey’s lemma 255 uum hypothesis 287
proof of Zermelo’s well-ordering theorem 255 continuum hypothesis 288
axiom of extensionality 256 forcing 288
axiom of infinity 256 generalized continuum hypothesis 289
axiom of pairing 257 inaccessible cardinals 290
axiom of power set 258 3 290
axiom of union 258 ♣ 290
axiom schema of separation 259 Dedekind infinite 291
de Morgan’s laws 260 Zermelo-Fraenkel axioms 291
de Morgan’s laws for sets (proof) 261 class 291
set theory 261 complement 293
union 264 delta system 293
universe 264 delta system lemma 293
von Neumann-Bernays-Gdel set theory 265 diagonal intersection 293
FS iterated forcing preserves chain condition 267
chain condition 268 intersection 294
composition of forcing notions 268 multiset 294
composition preserves chain condition 268 proof of delta system lemma 294
equivalence of forcing notions 269 rational number 295
forcing relation 270 saturated (set) 295
forcings are equivalent if one is dense in the other separation and doubletons axiom 295
270 set 296
iterated forcing 272 03Exx – Set theory 299
iterated forcing and composition 273 intersection 299
name 273 03F03 – Proof theory, general 300
partial order with chain condition does not col- NJp 300
lapse cardinals 274 NKp 300
proof of partial order with chain condition does natural deduction 301
not collapse cardinals 274 sequent 301
proof that forcing notions are equivalent to their sound,, complete 302
composition 275 03F07 – Structure of proofs 303
complete partial orders do not add small subsets induction 303
280 03F30 – First-order arithmetic and frag-
proof of complete partial orders do not add small ments 307
subsets 280 Elementary Functional Arithmetic 307
3 is equivalent to ♣ and continuum hypothesis PA 308
281 Peano arithmetic 308
Levy collapse 281 03F35 – Second- and higher-order arith-
proof of 3 is equivalent to ♣ and continuum hy- metic and fragments 310
pothesis 282 ACA0 310
ix
RCA0 310 05A19 – Combinatorial identities 342
Z2 310 Pascal’s rule 342
comprehension axiom 311 05A99 – Miscellaneous 343
induction axiom 311 principle of inclusion-exclusion 343
03G05 – Boolean algebras 313 principle of inclusion-exclusion proof 344
Boolean algebra 313 05B15 – Orthogonal arrays, Latin squares,
M. H. Stone’s representation theorem 313 Room squares 346
03G10 – Lattices and related structures example of Latin squares 346
314 graeco-latin squares 346
Boolean lattice 314 latin square 347
complete lattice 314 magic square 347
lattice 315 05B35 – Matroids, geometric lattices 348
03G99 – Miscellaneous 316 matroid 348
Chu space 316 polymatroid 353
Chu transform 316 05C05 – Trees 354
biextensional collapse 317 AVL tree 354
example of Chu space 317 Aronszajn tree 354
property of a Chu space 318 Suslin tree 354
05-00 – General reference works (hand- antichain 355
books, dictionaries, bibliographies, etc.) 319balanced tree 355
example of pigeonhole principle 319 binary tree 355
multi-index derivative of a power 319 branch 356
multi-index notation 320 child node (of a tree) 356
05A10 – Factorials, binomial coefficients, complete binary tree 357
combinatorial functions 322 digital search tree 357
Catalan numbers 322 digital tree 358
Levi-Civita permutation symbol 323 example of Aronszajn tree 358
Pascal’s rule (bit string proof) 325 example of tree (set theoretic) 359
Pascal’s rule proof 326 extended binary tree 359
Pascal’s triangle 326 external path length 360
Upper and lower bounds to binomial coefficient internal node (of a tree) 360
328 leaf node (of a tree) 361
binomial coefficient 328 parent node (in a tree) 361
double factorial 329 proof that ω has the tree property 362
factorial 329 root (of a tree) 362
falling factorial 330 tree 363
inductive proof of binomial theorem 331 weight-balanced binary trees are ultrametric 364
multinomial theorem 332 weighted path length 366
multinomial theorem (proof) 333 05C10 – Topological graph theory, imbed-
proof of upper and lower bounds to binomial co- ding 367
efficient 334 Heawood number 367
05A15 – Exact enumeration problems, gen- Kuratowski’s theorem 368
erating functions 336 Szemerédi-Trotter theorem 368
Stirling numbers of the first kind 336 crossing lemma 369
Stirling numbers of the second kind 338 crossing number 369
x
graph topology 369 Dirac theorem 396
planar graph 370 Euler circuit 397
proof of crossing lemma 370 Fleury’s algorithm 397
05C12 – Distance in graphs 372 Hamiltonian cycle 398
Hamming distance 372 Hamiltonian graph 398
05C15 – Coloring of graphs and hyper- Hamiltonian path 398
graphs 373 Ore’s theorem 398
bipartite graph 373 Petersen graph 399
chromatic number 374 hypohamiltonian 399
chromatic number and girth 375 traceable 399
chromatic polynomial 375 05C60 – Isomorphism problems (reconstruc-
colouring problem 376 tion conjecture, etc.) 400
complete bipartite graph 377 graph isomorphism 400
complete k-partite graph 378 05C65 – Hypergraphs 402
four-color conjecture 378 Steiner system 402
k-partite graph 379 finite plane 402
property B 380 hypergraph 403
05C20 – Directed graphs (digraphs), tour- linear space 404
naments 381 05C69 – Dominating sets, independent sets,
cut 381 cliques 405
de Bruijn digraph 381 Mantel’s theorem 405
directed graph 382 clique 405
flow 383 proof of Mantel’s theorem 405
maximum flow/minimum cut theorem 384 05C70 – Factorization, matching, covering
tournament 385 and packing 407
05C25 – Graphs and groups 387 Petersen theorem 407
Cayley graph 387 Tutte theorem 407
05C38 – Paths and cycles 388 bipartite matching 407
Euler path 388 edge covering 409
Veblen’s theorem 388 matching 409
acyclic graph 389 maximal bipartite matching algorithm 410
bridges of Knigsberg 389 maximal matching/minimal edge covering theo-
cycle 390 rem 411
girth 391 05C75 – Structural characterization of types
path 391 of graphs 413
proof of Veblen’s theorem 392 multigraph 413
05C40 – Connectivity 393 pseudograph 413
k-connected graph 393 05C80 – Random graphs 414
Thomassen’s theorem on 3-connected graphs 393 examples of probabilistic proofs 414
Tutte’s wheel theorem 394 probabilistic method 415
connected graph 394 05C90 – Applications 417
cutvertex 395 Hasse diagram 417
05C45 – Eulerian and Hamiltonian graphs 05C99 – Miscellaneous 419
396 Euler’s polyhedron theorem 419
Bondy and Chvtal theorem 396 Poincaré formula 419
xi
Turan’s theorem 419 equivalence relation 443
Wagner’s theorem 420 06-XX – Order, lattices, ordered algebraic
block 420 structures 445
bridge 420 join 445
complete graph 420 meet 445
degree (of a vertex) 421 06A06 – Partial order, general 446
distance (in a graph) 421 directed set 446
edge-contraction 421 infimum 446
graph 422 sets that do not have an infimum 447
graph minor theorem 422 supremum 447
graph theory 423 upper bound 448
homeomorphism 424 06A99 – Miscellaneous 449
loop 424 dense (in a poset) 449
minor (of a graph) 424 partial order 449
neighborhood (of a vertex) 425 poset 450
null graph 425 quasi-order 450
order (of a graph) 425 well quasi ordering 450
proof of Euler’s polyhedron theorem 426 06B10 – Ideals, congruence relations 452
proof of Turan’s theorem 427 order in an algebra 452
realization 427 06C05 – Modular lattices, Desarguesian
size (of a graph) 428 lattices 453
subdivision 428 modular lattice 453
subgraph 429 06D99 – Miscellaneous 454
wheel graph 429 distributive 454
05D05 – Extremal set theory 431 distributive lattice 454
LYM inequality 431 06E99 – Miscellaneous 455
Sperner’s theorem 432 Boolean ring 455
05D10 – Ramsey theory 433 08A40 – Operations, polynomials, primal
Erdös-Rado theorem 433 algebras 456
Ramsey’s theorem 433 coefficients of a polynomial 456
Ramsey’s theorem 434 08A99 – Miscellaneous 457
arrows 435 binary operation 457
coloring 436 filtered algebra 457
proof of Ramsey’s theorem 437 11-00 – General reference works (hand-
05D15 – Transversal (matching) theory 438 books, dictionaries, bibliographies, etc.) 459
Hall’s marriage theorem 438 Euler phi-function 459
proof of Hall’s marriage theorem 438 Euler-Fermat theorem 460
saturate 440 Fermat’s little theorem 460
system of distinct representatives 440 Fermat’s theorem proof 460
05E05 – Symmetric functions 441 Goldbach’s conjecture 460
elementary symmetric polynomial 441 Jordan’s totient function 461
reduction algorithm for symmetric polynomials Legendre symbol 461
441 Pythagorean triplet 462
06-00 – General reference works (hand- Wilson’s theorem 462
books, dictionaries, bibliographies, etc.) 443arithmetic mean 462
xii
ceiling 463 proof of Lucas’s theorem 486
computation of powers using Fermat’s little the- 11A15 – Power residues, reciprocity 487
orem 463 Euler’s criterion 487
congruences 464 Gauss’ lemma 487
coprime 464 Zolotarev’s lemma 489
cube root 464 cubic reciprocity law 491
floor 465 proof of Euler’s criterion 493
geometric mean 465 proof of quadratic reciprocity rule 494
googol 466 quadratic character of 2 495
googolplex 467 quadratic reciprocity for polynomials 496
greatest common divisor 467 quadratic reciprocity rule 497
group theoretic proof of Wilson’s theorem 467 quadratic residue 497
harmonic mean 467 11A25 – Arithmetic functions; related num-
mean 468 bers; inversion formulas 498
number field 468 Dirichlet character 498
pi 468 Liouville function 498
proof of Wilson’s theorem 470 Mangoldt function 499
proof of fundamental theorem of arithmetic 471 Mertens’ first theorem 499
root of unity 471 Moebius function 499
11-01 – Instructional exposition (textbooks, Moebius in version 500
tutorial papers, etc.) 472 arithmetic function 502
base 472 multiplicative function 503
11-XX – Number theory 474 non-multiplicative function 505
Lehmer’s Conjecture 474 totient 507
Sierpinski conjecture 474 unit 507
prime triples conjecture 475 11A41 – Primes 508
11A05 – Multiplicative structure; Euclidean Chebyshev functions 508
algorithm; greatest common divisors 476 Euclid’s proof of the infinitude of primes 509
Bezout’s lemma (number theory) 476 Mangoldt summatory function 509
Euclid’s algorithm 476 Mersenne numbers 510
Euclid’s lemma 478 Thue’s lemma 510
Euclid’s lemma proof 478 composite number 511
fundamental theorem of arithmetic 479 prime 511
perfect number 479 prime counting function 511
smooth number 480 prime difference function 512
11A07 – Congruences; primitive roots; residue prime number theorem 512
systems 481 prime number theorem result 513
Anton’s congruence 481 proof of Thue’s Lemma 514
Fermat’s Little Theorem proof (Inductive) 482 semiprime 515
Jacobi symbol 483 sieve of Eratosthenes 516
Shanks-Tonelli algorithm 483 test for primality of Mersenne numbers 516
Wieferich prime 483 11A51 – Factorization; primality 517
Wilson’s theorem for prime powers 484 Fermat Numbers 517
factorial module prime powers 485 Fermat compositeness test 517
proof of Euler-Fermat theorem 485 Zsigmondy’s theorem 518
xiii
divisibility 518 Szemerédi’s theorem 548
division algorithm for integers 519 multidimensional arithmetic progression 549
proof of division algorithm for integers 519 11B34 – Representation functions 550
square-free number 520 Erdös-Fuchs theorem 550
squarefull number 520 11B37 – Recurrences 551
the prime power dividing a factorial 521 Collatz problem 551
11A55 – Continued fractions 523 recurrence relation 551
Stern-Brocot tree 523 11B39 – Fibonacci and Lucas numbers and
continued fraction 524 polynomials and generalizations 553
11A63 – Radix representation; digital prob- Fibonacci sequence 553
lems 527 Hogatt’s theorem 554
Kummer’s theorem 527 Lucas numbers 554
corollary of Kummer’s theorem 528 golden ratio 554
11A67 – Other representations 529 11B50 – Sequences (mod m) 556
Sierpinski Erdös egyptian fraction conjecture 529 Erdös-Ginzburg-Ziv theorem 556
adjacent fraction 529 11B57 – Farey sequences; the sequences ?
any rational number is a sum of unit fractions 557
530 Farey sequence 557
conjecture on fractions with odd denominators 11B65 – Binomial coefficients; factorials;
532 q-identities 559
unit fraction 532 Lucas’s Theorem 559
11A99 – Miscellaneous 533 binomial theorem 559
ABC conjecture 533 11B68 – Bernoulli and Euler numbers and
Suranyi theorem 533 polynomials 561
irrational to an irrational power can be rational Bernoulli number 561
534 Bernoulli periodic function 561
triangular numbers 534 Bernoulli polynomial 562
11B05 – Density, gaps, topology 536 generalized Bernoulli number 562
Cauchy-Davenport theorem 536 11B75 – Other combinatorial number the-
Mann’s theorem 536 ory 563
Schnirelmann density 537 Erdös-Heilbronn conjecture 563
Sidon set 537 Freiman isomorphism 563
asymptotic density 538 sum-free 564
discrete space 538 11B83 – Special sequences and polynomi-
essential component 539 als 565
normal order 539 Beatty sequence 565
11B13 – Additive bases 541 Beatty’s theorem 566
Erdös-Turan conjecture 541 Fraenkel’s partition theorem 566
additive basis 542 Sierpinski numbers 567
asymptotic basis 542 palindrome 567
base con version 542 proof of Beatty’s theorem 568
sumset 546 square-free sequence 569
11B25 – Arithmetic progressions 547 superincreasing sequence 569
Behrend’s construction 547 11B99 – Miscellaneous 570
Freiman’s theorem 548 Lychrel number 570
xiv
closed form 571 Schanuel’s conjecutre 598
11C08 – Polynomials 573 period 598
content of a polynomial 573 11G05 – Elliptic curves over global fields
cyclotomic polynomial 573 600
height of a polynomial 574 complex multiplication 600
length of a polynomial 574 11H06 – Lattices and convex bodies 602
proof of Eisenstein criterion 574 Minkowski’s theorem 602
proof that the cyclotomic polynomial is irreducible lattice in Rn 602
575 11H46 – Products of linear forms 604
11D09 – Quadratic and bilinear equations triple scalar product 604
577 11J04 – Homogeneous approximation to
Pell’s equation and simple continued fractions one number 605
577 Dirichlet’s approximation theorem 605
11D41 – Higher degree equations; Fermat’s 11J68 – Approximation to algebraic num-
equation 578 bers 606
Beal conjecture 578 Davenport-Schmidt theorem 606
Euler quartic conjecture 579 Liouville approximation theorem 606
Fermat’s last theorem 580 proof of Liouville approximation theorem 607
11D79 – Congruences in many variables 11J72 – Irrationality; linear independence
582 over a field 609
Chinese remainder theorem 582 nth root of 2 is irrational for n ≥ 3 (proof using
Chinese remainder theorem proof 583 Fermat’s last theorem) 609
11D85 – Representation problems 586 e is irrational (proof) 610
polygonal number 586 irrational 610
11D99 – Miscellaneous 588 square root of 2 is irrational 611
Diophantine equation 588 11J81 – Transcendence (general theory)
11E39 – Bilinear and Hermitian forms 590 612
Hermitian form 590 Fundamental Theorem of Transcendence 612
non-degenerate bilinear form 590 Gelfond’s theorem 612
positive definite form 591 four exponentials conjecture 612
symmetric bilinear form 591 six exponentials theorem 613
Clifford algebra 591 transcendental number 614
11Exx – Forms and linear algebraic groups 11K16 – Normal numbers, radix expan-
593 sions, etc. 615
quadratic function associated with a linear func- absolutely normal 615
tional 593 11K45 – Pseudo-random numbers; Monte
11F06 – Structure of modular groups and Carlo methods 617
generalizations; arithmetic groups 594 pseudorandom numbers 617
Taniyama-Shimura theorem 594 quasirandom numbers 618
11F30 – Fourier coefficients of automor- random numbers 619
phic forms 597 truly random numbers 619
Fourier coefficients 597 11L03 – Trigonometric and exponential sums,
11F67 – Special values of automorphic L- general 620
series, periods of modular forms, cohomol- Ramanujan sum 620
ogy, modular symbols 598 11L05 – Gauss and Kloosterman sums; gen-
xv
eralizations 622 11P05 – Waring’s problem and variants
Gauss sum 622 648
Kloosterman sum 623 Lagrange’s four-square theorem 648
Landsberg-Schaar relation 623 Waring’s problem 648
derivation of Gauss sum up to a sign 624 proof of Lagrange’s four-square theorem 649
11L40 – Estimates on character sums 625 11P81 – Elementary theory of partitions
Plya-Vinogradov inequality 625 651
11M06 – ζ(s) and L(s, χ) 627 pentagonal number theorem 651
Apéry’s constant 627 11R04 – Algebraic numbers; rings of alge-
Dedekind zeta function 627 braic integers 653
Dirichlet L-series 628 Dedekind domain 653
Riemann θ-function 629 Dirichlet’s unit theorem 653
Riemann Xi function 630 Eisenstein integers 654
Riemann omega function 630 Galois representation 654
Gaussian integer 658
functional equation for the Riemann Xi function
630 algebraic conjugates 659
algebraic integer 659
functional equation for the Riemann theta func-
tion 631 algebraic number 659
generalized Riemann hypothesis 631 algebraic number field 659
calculating the splitting of primes 660
proof of functional equation for the Riemann theta
function 631 characterization in terms of prime ideals 661
11M99 – Miscellaneous 633 ideal classes form an abelian group 661
Riemann zeta function 633 integral basis 661
formulae for zeta in the critical strip 636 integrally closed 662
transcendental root theorem 662
functional equation of the Riemann zeta function
638 11R06 – PV-numbers and generalizations;
other special algebraic numbers 663
value of the Riemann zeta function at s = 2 638
11N05 – Distribution of primes 640 Salem number 663
Bertrand’s conjecture 640 11R11 – Quadratic extensions 664
Brun’s constant 640 prime ideal decomposition in quadratic exten-
proof of Bertrand’s conjecture 640 sions of Q 664
twin prime conjecture 642 11R18 – Cyclotomic extensions 666
11N13 – Primes in progressions 643 Kronecker-Weber theorem 666
primes in progressions 648 examples of regular primes 667
prime ideal decomposition in cyclotomic exten-
11N32 – Primes represented by polynomi- sions of Q 668
als; other multiplicative structure of poly- regular prime 669
nomial values 644 11R27 – Units and factorization 670
Euler four-square identity 644 regulator 670
11N56 – Rate of growth of arithmetic func- 11R29 – Class numbers, class groups, dis-
tions 645 criminants 672
highly composite number 645 Existence of Hilbert Class Field 672
11N99 – Miscellaneous 646 class number formula 673
Chinese remainder theorem 646 discriminant 673
proof of chinese remainder theorem 646 ideal class 674
xvi
ray class group 675 of n 712
11R32 – Galois theory 676 12-00 – General reference works (hand-
Galois criterion for solvability of a polynomial by books, dictionaries, bibliographies, etc.) 714
radicals 676 monomial 714
11R34 – Galois cohomology 677 order and degree of polynomial 715
Hilbert Theorem 90 677 12-XX – Field theory and polynomials 716
11R37 – Class field theory 678 homogeneous polynomial 716
Artin map 678 subfield 716
Tchebotarev density theorem 679 12D05 – Polynomials: factorization 717
modulus 679 factor theorem 717
multiplicative congruence 680 proof of factor theorem 717
ray class field 680 proof of rational root theorem 718
11R56 – Adèle rings and groups 682 rational root theorem 719
adle 682 sextic equation 719
idle 682 12D10 – Polynomials: location of zeros
restricted direct product 683 (algebraic theorems) 720
11R99 – Miscellaneous 684 Cardano’s derivation of the cubic formula 720
Henselian field 684 Ferrari-Cardano derivation of the quartic formula
valuation 685 721
11S15 – Ramification and extension the- Galois-theoretic derivation of the cubic formula
ory 686 722
decomposition group 686 Galois-theoretic derivation of the quartic formula
examples of prime ideal decomposition in num- 724
ber fields 688 cubic formula 728
inertial degree 691 derivation of quadratic formula 728
ramification index 692 quadratic formula 729
unramified action 697 quartic formula 730
11S31 – Class field theory; p-adic formal reciprocal polynomial 730
groups 699 root 731
Hilbert symbol 699 variant of Cardano’s derivation 732
11S99 – Miscellaneous 700 12D99 – Miscellaneous 733
p-adic integers 700 Archimedean property 733
local field 701 complex 734
11Y05 – Factorization 703 complex conjugate 735
Pollard’s rho method 703 complex number 737
quadratic sieve 706 examples of totally real fields 738
11Y55 – Calculation of integer sequences fundamental theorem of algebra 739
709 imaginary 739
Kolakoski sequence 709 imaginary unit 739
11Z05 – Miscellaneous applications of num- indeterminate form 739
ber theory 711 inequalities for real numbers 740
τ function 711 interval 742
arithmetic derivative 711 modulus of complex number 743
example of arithmetic derivative 712 proof of fundamental theorem of algebra 744
proof that τ (n) is the number of positive divisors proof of the fundamental theorem of algebra 744
xvii
real and complex embeddings 744 Galois conjugate 773
real number 746 Galois extension 773
totally real and imaginary fields 747 Galois group 773
12E05 – Polynomials (irreducibility, etc.) absolute Galois group 774
748 cyclic extension 774
Gauss’s Lemma I 748 example of nonperfect field 774
Gauss’s Lemma II 749 fixed field 774
discriminant 749 infinite Galois theory 774
polynomial ring 751 normal closure 776
resolvent 751 normal extension 776
de Moivre identity 754 perfect field 777
monic 754 radical extension 777
Wedderburn’s Theorem 754 separable 777
proof of Wedderburn’s theorem 755 separable closure 778
second proof of Wedderburn’s theorem 756 12F20 – Transcendental extensions 779
finite field 757 transcendence degree 779
Frobenius automorphism 760 12F99 – Miscellaneous 780
characteristic 761 composite field 780
characterization of field 761 extension field 780
example of an infinite field of finite characteristic 12J15 – Ordered fields 782
762 ordered field 782
examples of fields 762 13-00 – General reference works (hand-
field 764 books, dictionaries, bibliographies, etc.) 783
field homomorphism 764 absolute value 783
prime subfield 765 associates 784
12F05 – Algebraic extensions 766 cancellation ring 784
a finite extension of fields is an algebraic exten- comaximal 784
sion 766 every prime ideal is radical 784
algebraic closure 767 module 785
algebraic extension 767 radical of an ideal 786
algebraically closed 767 ring 786
algebraically dependent 768 subring 787
existence of the minimal polynomial 768 tensor product 787
finite extension 769 13-XX – Commutative rings and algebras
minimal polynomial 769 789
norm 770 commutative ring 789
primitive element theorem 770 13A02 – Graded rings 790
splitting field 770 graded ring 790
the field extension R/Q is not finite 771 13A05 – Divisibility 791
trace 771 Eisenstein criterion 791
12F10 – Separable extensions, Galois the- 13A10 – Radical theory 792
ory 772 Hilbert’s Nullstellensatz 792
Abelian extension 772 nilradical 792
Fundamental Theorem of Galois Theory 772 radical of an integer 793
Galois closure 773 13A15 – Ideals; multiplicative ideal theory
xviii
794 13C15 – Dimension theory, depth, related
contracted ideal 794 rings (catenary, etc.) 814
existence of maximal ideals 794 Krull’s principal ideal theorem 814
extended ideal 795 13C99 – Miscellaneous 815
fractional ideal 796 Artin-Rees theorem 815
homogeneous ideal 797 Nakayama’s lemma 815
ideal 797 prime ideal 815
maximal ideal 797 proof of Nakayama’s lemma 816
principal ideal 798 proof of Nakayama’s lemma 817
the set of prime ideals of a commutative ring support 817
with identity 798 13E05 – Noetherian rings and modules 818
13A50 – Actions of groups on commuta- Hilbert basis theorem 818
tive rings; invariant theory 799 Noetherian module 818
Schwarz (1975) theorem 799 proof of Hilbert basis theorem 819
invariant polynomial 800 finitely generated modules over a principal ideal
13A99 – Miscellaneous 801 domain 819
Lagrange’s identity 801 13F07 – Euclidean rings and generaliza-
characteristic 802 tions 821
cyclic ring 802 Euclidean domain 821
proof of Euler four-square identity 803 Euclidean valuation 821
proof that every subring of a cyclic ring is a cyclic proof of Bezout’s Theorem 822
ring 804 proof that an Euclidean domain is a PID 822
proof that every subring of a cyclic ring is an 13F10 – Principal ideal rings 823
ideal 804 Smith normalform 823
zero ring 805 13F25 – Formal power series rings 825
13B02 – Extension theory 806 formal power series 825
algebraic 806 13F30 – Valuation rings 831
module-finite 806 discrete valuation 831
13B05 – Galois theory 807 discrete valuation ring 831
algebraic 807 13G05 – Integral domains 833
13B21 – Integral dependence 808 Dedekind-Hasse valuation 833
integral 808 PID 834
13B22 – Integral closure of rings and ide- UFD 834
als ; integrally closed rings, related rings a finite integral domain is a field 835
(Japanese, etc.) 809 an artinian integral domain is a field 835
integral closure 809 example of PID 835
13B30 – Quotients and localization 810 field of quotients 836
fraction field 810 integral domain 836
localization 810 irreducible 837
multiplicative set 811 motivation for Euclidean domains 837
13C10 – Projective and free modules and zero divisor 838
ideals 812 13H05 – Regular local rings 839
example of free module 812 regular local ring 839
13C12 – Torsion modules and ideals 813 13H99 – Miscellaneous 840
torsion element 813 local ring 840
xix
semi-local ring 841 height of a prime ideal 866
13J10 – Complete rings, completion 842 invertible sheaf 866
completion 842 locally free 867
13J25 – Ordered rings 844 normal irreducible varieties are nonsingular in
ordered ring 844 codimension 1 867
13J99 – Miscellaneous 845 sheaf of meromorphic functions 867
topological ring 845 very ample 867
13N15 – Derivations 846 14C20 – Divisors, linear systems, invert-
derivation 846 ible sheaves 869
13P10 – Polynomial ideals, Gröbner bases divisor 869
847 Rational and birational maps 870
Gröbner basis 847 general type 870
14-00 – General reference works (hand- 14F05 – Vector bundles, sheaves, related
books, dictionaries, bibliographies, etc.) 849constructions 871
Picard group 849 direct image (functor) 871
affine space 849 14F20 – Étale and other Grothendieck topolo-
affine variety 849 gies and cohomologies 872
dual isogeny 850 site 872
finite morphism 850 14F25 – Classical real and complex coho-
isogeny 851 mology 873
line bundle 851 Serre duality 873
nonsingular variety 852 sheaf cohomology 874
projective space 852 14G05 – Rational points 875
projective variety 854 Hasse principle 875
quasi-finite morphism 854 14H37 – Automorphisms 876
14A10 – Varieties and morphisms 855 Frobenius morphism 876
Zariski topology 855 14H45 – Special curves and curves of low
algebraic map 856 genus 878
algebraic sets and polynomial ideals 856 Fermat’s spiral 878
noetherian topological space 857 archimedean spiral 878
regular map 857 folium of Descartes 879
structure sheaf 858 spiral 879
14A15 – Schemes and morphisms 859 14H50 – Plane and space curves 880
closed immersion 859 torsion (space curve) 880
coherent sheaf 859 14H52 – Elliptic curves 881
fibre product 860 Birch and Swinnerton-Dyer conjecture 881
prime spectrum 860 Hasse’s bound for elliptic curves over finite fields
scheme 863 882
separated scheme 864 L-series of an elliptic curve 882
singular set 864 Mazur’s theorem on torsion of elliptic curves 884
14A99 – Miscellaneous 865 Mordell curve 884
Cartier divisor 865 Nagell-Lutz theorem 885
General position 865 Selmer group 886
Serre’s twisting theorem 866 bad reduction 887
ample 866 conductor of an elliptic curve 890
xx
elliptic curve 890 diagonally dominant matrix 920
height function 894 eigenvalue (of a matrix) 921
j-invariant 895 eigenvalue problem 922
rank of an elliptic curve 896 eigenvalues of orthogonal matrices 924
supersingular 897 eigenvector 925
the torsion subgroup of an elliptic curve injects exactly determined 926
in the reduction of the curve 897 free vector space over a set 926
14H99 – Miscellaneous 900 in a vector space, λv = 0 if and only if λ = 0 or
Riemann-Roch theorem 900 v is the zero vector 928
genus 900 invariant subspace 929
projective curve 901 least squares 929
proof of Riemann-Roch theorem 901 linear algebra 930
14L17 – Affine algebraic groups, hyperal- linear least squares 932
gebra constructions 902 linear manifold 934
affine algebraic group 902 matrix exponential 934
algebraic torus 902 matrix operations 935
14M05 – Varieties defined by ring con- nilpotent matrix 938
ditions (factorial, Cohen-Macaulay, semi- nilpotent transformation 938
normal) 903 non-zero vector 939
normal 903 off-diagonal entry 940
14M15 – Grassmannians, Schubert vari- orthogonal matrices 940
eties, flag manifolds 904 orthogonal vectors 941
Borel-Bott-Weil theorem 904 overdetermined 941
flag variety 905 partitioned matrix 941
14R15 – Jacobian problem 906 pentadiagonal matrix 942
Jacobian conjecture 906 proof of Cayley-Hamilton theorem 942
15-00 – General reference works (hand- proof of Schur decomposition 943
books, dictionaries, bibliographies, etc.) 907singular value decomposition 944
Cholesky decomposition 907 skew-symmetric matrix 945
Hadamard matrix 908 square matrix 946
Hessenberg matrix 909 strictly upper triangular matrix 946
If A ∈ Mn (k) and A is supertriangular then symmetric matrix 947
An = 0 910 theorem for normal triangular matrices 947
Jacobi determinant 910 triangular matrix 948
Jacobi’s Theorem 912 tridiagonal matrix 949
Kronecker product 912 under determined 950
LU decomposition 913 unit triangular matrix 950
Peetre’s inequality 914 unitary 951
Schur decomposition 915 vector space 952
antipodal 916 vector subspace 953
conjugate transpose 916 zero map 954
corollary of Schur decomposition 917 zero vector in a vector space is unique 955
covector 918 zero vector space 955
diagonal matrix 918 15-01 – Instructional exposition (textbooks,
diagonalization 920 tutorial papers, etc.) 956
xxi
circulant matrix 956 15A06 – Linear equations 989
matrix 957 Gaussian elimination 989
15-XX – Linear and multilinear algebra; finite-dimensional linear problem 991
matrix theory 960 homogeneous linear problem 992
linearly dependent functions 960 linear problem 993
15A03 – Vector spaces, linear dependence, reduced row echelon form 993
rank 961 row echelon form 994
Sylvester’s law 961 under-determined polynomial interpolation 994
basis 961 15A09 – Matrix inversion, generalized in-
complementary subspace 962 verses 996
dimension 963 matrix adjoint 996
every vector space has a basis 964 matrix inverse 997
flag 964 15A12 – Conditioning of matrices 1000
frame 965 singular 1000
linear combination 968 15A15 – Determinants, permanents, other
linear independence 968 special matrix functions 1001
list vector 968 Cayley-Hamilton theorem 1001
nullity 969 Cramer’s rule 1001
orthonormal basis 970 cofactor expansion 1002
physical vector 970 determinant 1003
proof of rank-nullity theorem 972 determinant as a multilinear mapping 1005
rank 973 determinants of some matrices of special form
rank-nullity theorem 973 1006
similar matrix 974 example of Cramer’s rule 1006
span 975 proof of Cramer’s rule 1008
theorem for the direct sum of finite dimensional proof of cofactor expansion 1008
vector spaces 976 resolvent matrix 1009
vector 976 15A18 – Eigenvalues, singular values, and
15A04 – Linear transformations, semilin- eigenvectors 1010
ear transformations 980 Jordan canonical form theorem 1010
admissibility 980 Lagrange multiplier method 1011
conductor of a vector 980 Perron-Frobenius theorem 1011
cyclic decomposition theorem 981 characteristic equation 1012
cyclic subspace 981 eigenvalue 1012
dimension theorem for symplectic complement eigenvalue 1013
(proof) 981 15A21 – Canonical forms, reductions, clas-
dual homomorphism 982 sification 1015
dual homomorphism of the derivative 983 companion matrix 1015
image of a linear transformation 984 eigenvalues of an involution 1015
invertible linear transformation 984 linear involution 1016
kernel of a linear transformation 985 normal matrix 1017
linear transformation 985 projection 1018
minimal polynomial (endomorphism) 986 quadratic form 1019
symplectic complement 987 15A23 – Factorization of matrices 1021
trace 988 QR decomposition 1021
xxii
15A30 – Algebraic systems of matrices 1023 inner product space 1051
ideals in matrix algebras 1023 proof of Cauchy-Schwarz inequality 1052
15A36 – Matrices of integers 1025 self-dual 1052
permutation matrix 1025 skew-symmetric bilinear form 1053
15A39 – Linear inequalities 1026 spectral theorem 1053
Farkas lemma 1026 15A66 – Clifford algebras, spinors 1056
15A42 – Inequalities involving eigenvalues geometric algebra 1056
and eigenvectors 1027 15A69 – Multilinear algebra, tensor prod-
Gershgorin’s circle theorem 1027 ucts 1058
Gershgorin’s circle theorem result 1027 Einstein summation convention 1058
Shur’s inequality 1028 basic tensor 1059
15A48 – Positive matrices and their gen- multi-linear 1061
eralizations; cones of matrices 1029 outer multiplication 1061
negative definite 1029 tensor 1062
negative semidefinite 1029 tensor algebra 1065
positive definite 1030 tensor array 1065
positive semidefinite 1030 tensor product (vector spaces) 1067
primitive matrix 1031 tensor transformations 1069
reducible matrix 1031 15A72 – Vector and tensor algebra, theory
15A51 – Stochastic matrices 1032 of invariants 1072
Birkoff-von Neumann theorem 1032 bac-cab rule 1072
proof of Birkoff-von Neumann theorem 1032 cross product 1072
15A57 – Other types of matrices (Hermi- euclidean vector 1073
tian, skew-Hermitian, etc.) 1035 rotational invariance of cross product 1074
Hermitian matrix 1035 15A75 – Exterior algebra, Grassmann al-
direct sum of Hermitian and skew-Hermitian ma- gebras 1076
trices 1036 contraction 1076
identity matrix 1037 exterior algebra 1077
skew-Hermitian matrix 1037 15A99 – Miscellaneous topics 1081
transpose 1038 Kronecker delta 1081
15A60 – Norms of matrices, numerical range,dual space 1081
applications of functional analysis to ma- example of trace of a matrix 1083
trix theory 1041 generalized Kronecker delta symbol 1083
Frobenius matrix norm 1041 linear functional 1084
matrix p-norm 1042 modules are a generalization of vector spaces 1084
self consistent matrix norm 1043 proof of properties of trace of a matrix 1085
15A63 – Quadratic and bilinear forms, in- quasipositive matrix 1086
ner products 1044 trace of a matrix 1086
Cauchy-Schwarz inequality 1044
adjoint endomorphism 1045 Volume 2
anti-symmetric 1046
bilinear map 1046 16-00 – General reference works (hand-
dot product 1049 books, dictionaries, bibliographies, etc.) 1088
every orthonormal set is linearly independent 1050 direct product of modules 1088
inner product 1051 direct sum 1089
xxiii
exact sequence 1089 simple module 1107
quotient ring 1090 superfluous submodule 1107
16D10 – General module theory 1091 uniform module 1108
annihilator 1091 16E05 – Syzygies, resolutions, complexes
annihilator is an ideal 1091 1109
artinian 1092 n-chain 1109
composition series 1092 chain complex 1109
conjugate module 1093 flat resolution 1110
modular law 1093 free resolution 1110
module 1093 injective resolution 1110
proof of modular law 1094 projective resolution 1110
zero module 1094 short exact sequence 1111
16D20 – Bimodules 1095 split short exact sequence 1111
bimodule 1095 von Neumann regular 1111
16D25 – Ideals 1096 16K20 – Finite-dimensional 1112
associated prime 1096 quaternion algebra 1112
nilpotent ideal 1096 16K50 – Brauer groups 1113
primitive ideal 1096 Brauer group 1113
product of ideals 1097 16K99 – Miscellaneous 1114
proper ideal 1097 division ring 1114
semiprime ideal 1097 16N20 – Jacobson radical, quasimultipli-
zero ideal 1098 cation 1115
16D40 – Free, projective, and flat modules Jacobson radical 1115
and ideals 1099 a ring modulo its Jacobson radical is semiprimi-
finitely generated projective module 1099 tive 1116
flat module 1099 examples of semiprimitive rings 1116
free module 1100 proof of Characterizations of the Jacobson radi-
free module 1100 cal 1117
projective cover 1100 properties of the Jacobson radical 1118
projective module 1101 quasi-regularity 1119
16D50 – Injective modules, self-injective semiprimitive ring 1120
rings 1102 16N40 – Nil and nilpotent radicals, sets,
injective hull 1102 ideals, rings 1121
injective module 1102 Koethe conjecture 1121
16D60 – Simple and semisimple modules, nil and nilpotent ideals 1121
primitive rings and ideals 1104 16N60 – Prime and semiprime rings 1123
central simple algebra 1104 prime ring 1123
completely reducible 1104 16N80 – General radicals and rings 1124
simple ring 1105 prime radical 1124
16D80 – Other classes of modules and ide- radical theory 1124
als 1106 16P40 – Noetherian rings and modules 1126
essential submodule 1106 Noetherian ring 1126
faithful module 1106 noetherian 1126
minimal prime ideal 1107 16P60 – Chain conditions on annihilators
module of finite rank 1107 and summands: Goldie-type conditions ,
xxiv
Krull dimension 1128 bialgebra 1148
Goldie ring 1128 coalgebra 1148
uniform dimension 1128 coinvariant 1149
16S10 – Rings determined by universal prop-comodule 1149
erties (free algebras, coproducts, adjunc- comodule algebra 1149
tion of inverses, etc.) 1130 comodule coalgebra 1150
Ore domain 1130 module algebra 1150
16S34 – Group rings , Laurent polynomial module coalgebra 1150
rings 1131 16W50 – Graded rings and modules 1151
support 1131 graded algebra 1151
16S36 – Ordinary and skew polynomial rings graded module 1151
and semigroup rings 1132 supercommutative 1151
Gaussian polynomials 1132 16W55 – “Super” (or “skew”) structure
q skew derivation 1133 1153
q skew polynomial ring 1133 super tensor product 1153
sigma derivation 1133 superalgebra 1153
sigma, delta constant 1133 supernumber 1154
skew derivation 1133 16W99 – Miscellaneous 1155
skew polynomial ring 1134 Hamiltonian quaternions 1155
16S99 – Miscellaneous 1135 16Y30 – Near-rings 1158
algebra 1135 near-ring 1158
algebra (module) 1135 17A01 – General theory 1159
16U10 – Integral domains 1137 commutator bracket 1159
Prüfer domain 1137 17B05 – Structure theory 1161
valuation domain 1137 Killing form 1161
16U20 – Ore rings, multiplicative sets, Ore Levi’s theorem 1161
localization 1139 nilradical 1161
Goldie’s Theorem 1139 radical 1162
Ore condition 1139 17B10 – Representations, algebraic theory
Ore’s theorem 1140 (weights) 1163
classical ring of quotients 1140 Ado’s theorem 1163
saturated 1141 Lie algebra representation 1163
16U70 – Center, normalizer (invariant el- adjoint representation 1164
ements) 1142 examples of non-matrix Lie groups 1165
center (rings) 1142 isotropy representation 1165
16U99 – Miscellaneous 1143 17B15 – Representations, analytic theory
anti-idempotent 1143 1166
16W20 – Automorphisms and endomor- invariant form (Lie algebras) 1166
phisms 1144 17B20 – Simple, semisimple, reductive (su-
ring of endomorphisms 1144 per)algebras (roots) 1167
16W30 – Coalgebras, bialgebras, Hopf al- Borel subalgebra 1167
gebras ; rings, modules, etc. on which Borel subgroup 1167
these act 1146 Cartan matrix 1168
Hopf algebra 1146 Cartan subalgebra 1168
almost cocommutative bialgebra 1147 Cartan’s criterion 1168
xxv
Casimir operator 1168 monic 1194
Dynkin diagram 1169 natural equivalence 1195
Verma module 1169 representable functor 1195
Weyl chamber 1170 supplemental axioms for an Abelian category 1195
Weyl group 1170 18A05 – Definitions, generalizations 1197
Weyl’s theorem 1170 autofunctor 1197
classification of finite-dimensional representations automorphism 1197
of semi-simple Lie algebras 1171 category 1198
cohomology of semi-simple Lie algebras 1171 category example (arrow category) 1199
nilpotent cone 1171 commutative diagram 1199
parabolic subgroup 1172 double dual embedding 1200
pictures of Dynkin diagrams 1172 dual category 1201
positive root 1175 duality principle 1201
rank 1175 endofunctor 1202
root lattice 1175 examples of initial objects, terminal objects and
root system 1176 zero objects 1202
simple and semi-simple Lie algebras 1177 forgetful functor 1204
simple root 1178 isomorphism 1205
weight (Lie algebras) 1178 natural transformation 1205
weight lattice 1178 types of homomorphisms 1205
17B30 – Solvable, nilpotent (super)algebras zero object 1206
1179 18A22 – Special properties of functors (faith-
Engel’s theorem 1179 ful, full, etc.) 1208
Lie’s theorem 1182 exact functor 1208
solvable Lie algebra 1183 18A25 – Functor categories, comma cate-
17B35 – Universal enveloping (super)algebras gories 1210
1184 Yoneda embedding 1210
Poincaré-Birkhoff-Witt theorem 1184 18A30 – Limits and colimits (products, sums,
universal enveloping algebra 1185 directed limits, pushouts, fiber products,
17B56 – Cohomology of Lie (super)algebras equalizers, kernels, ends and coends, etc.)
1187 1211
Lie algebra cohomology 1187 categorical direct product 1211
17B67 – Kac-Moody (super)algebras (struc- categorical direct sum 1211
ture and representation theory) 1188 kernel 1212
Kac-Moody algebra 1188 18A40 – Adjoint functors (universal con-
generalized Cartan matrix 1188 structions, reflective subcategories, Kan ex-
17B99 – Miscellaneous 1190 tensions, etc.) 1213
Jacobi identity interpretations 1190 adjoint functor 1213
Lie algebra 1190 equivalence of categories 1214
real form 1192 18B40 – Groupoids, semigroupoids, semi-
18-00 – General reference works (hand- groups, groups (viewed as categories) 1215
books, dictionaries, bibliographies, etc.) 1193 groupoid (category theoretic) 1215
Grothendieck spectral sequence 1193 18E10 – Exact categories, abelian cate-
category of sets 1194 gories 1216
functor 1194 abelian category 1216
xxvi
exact sequence 1217 associative 1245
derived category 1218 canonical projection 1246
enough injectives 1218 centralizer 1246
18F20 – Presheaves and sheaves 1219 commutative 1247
locally ringed space 1219 examples of groups 1247
presheaf 1220 group 1250
sheaf 1220 quotient group 1250
sheafification 1225 20-02 – Research exposition (monographs,
stalk 1226 survey articles) 1252
18F30 – Grothendieck groups 1228 length function 1252
Grothendieck group 1228 20-XX – Group theory and generalizations
18G10 – Resolutions; derived functors 1229 1253
derived functor 1229 free product with amalgamated subgroup 1253
18G15 – Ext and Tor, generalizations, Künnethnonabelian group 1254
formula 1231 20A05 – Axiomatics and elementary prop-
Ext 1231 erties 1255
18G30 – Simplicial sets, simplicial objects Feit-Thompson theorem 1255
(in a category) 1232 Proof: The orbit of any element of a group is a
nerve 1232 subgroup 1255
simplicial category 1232 center 1256
simplicial object 1233 characteristic subgroup 1256
18G35 – Chain complexes 1235 class function 1257
5-lemma 1235 conjugacy class 1258
9-lemma 1236 conjugacy class formula 1258
Snake lemma 1236 conjugate stabilizer subgroups 1258
chain homotopy 1237 coset 1259
chain map 1237 cyclic group 1259
homology (chain complex) 1237 derived subgroup 1260
18G40 – Spectral sequences, hypercoho- equivariant 1260
mology 1238 examples of finite simple groups 1261
spectral sequence 1238 finitely generated group 1262
19-00 – General reference works (hand- first isomorphism theorem 1262
fourth isomorphism theorem 1262
books, dictionaries, bibliographies, etc.) 1239
Algebraic K-theory 1239 generator 1263
K-theory 1240 group actions and homomorphisms 1263
examples of algebraic K-theory groups 1241 group homomorphism 1265
19K33 – EXT and K-homology 1242 homogeneous space 1265
Fredholm module 1242 identity element 1268
K-homology 1243 inner automorphism 1268
19K99 – Miscellaneous 1244 kernel 1269
examples of K-theory groups 1244 maximal 1269
20-00 – General reference works (hand- normal subgroup 1269
normality of subgroups is not transitive 1269
books, dictionaries, bibliographies, etc.) 1245
alternating group is a normal subgroup of the normalizer 1270
symmetric group 1245 order (of a group) 1271
xxvii
presentation of a group 1271 their modules 1294
proof of first isomorphism theorem 1272 group ring 1294
proof of second isomorphism theorem 1273 20C15 – Ordinary representations and char-
proof that all cyclic groups are abelian 1274 acters 1295
proof that all cyclic groups of the same order are Maschke’s theorem 1295
isomorphic to each other 1274 a representation which is not completely reducible
proof that all subgroups of a cyclic group are 1295
cyclic 1274 orthogonality relations 1296
regular group action 1275 20C30 – Representations of finite symmet-
second isomorphism theorem 1275 ric groups 1299
simple group 1276 example of immanent 1299
solvable group 1276 immanent 1299
subgroup 1276 permanent 1299
third isomorphism theorem 1277 20C99 – Miscellaneous 1301
20A99 – Miscellaneous 1279 Frobenius reciprocity 1301
Cayley table 1279 Schur’s lemma 1301
proper subgroup 1280 character 1302
quaternion group 1280 group representation 1303
20B05 – General theory for finite groups induced representation 1303
1282 regular representation 1304
cycle notation 1282 restriction representation 1304
permutation group 1283 20D05 – Classification of simple and non-
20B15 – Primitive groups 1284 solvable groups 1305
primitive transitive permutation group 1284 Burnside p − q theorem 1305
20B20 – Multiply transitive finite groups classification of semisimple groups 1305
1286 semisimple group 1305
Jordan’s theorem (multiply transitive groups) 1286 20D08 – Simple groups: sporadic groups
multiply transitive 1286 1307
sharply multiply transitive 1287 Janko groups 1307
20B25 – Finite automorphism groups of al- 20D10 – Solvable groups, theory of for-
gebraic, geometric, or combinatorial struc- mations, Schunck classes, Fitting classes,
tures 1288 π-length, ranks 1308
diamond theory 1288 Čuhinin’s Theorem 1308
20B30 – Symmetric groups 1289 separable 1308
symmetric group 1289 supersolvable group 1309
symmetric group 1289 20D15 – Nilpotent groups, p-groups 1310
20B35 – Subgroups of symmetric groups Burnside basis theorem 1310
1290 20D20 – Sylow subgroups, Sylow proper-
Cayley’s theorem 1290 ties, π-groups, π-structure 1311
20B99 – Miscellaneous 1291 π-groups and π 0 -groups 1311
(p, q) shuffle 1291 p-subgroup 1311
Frobenius group 1291 Burnside normal complement theorem 1312
permutation 1292 Frattini argument 1312
proof of Cayley’s theorem 1292 Sylow p-subgroup 1312
20C05 – Group rings of finite groups and Sylow theorems 1312
xxviii
Sylow’s first theorem 1313 rem 1329
Sylow’s third theorem 1313 semidirect product of groups 1330
application of Sylow’s theorems to groups of or- wreath product 1333
der pq 1313 Jordan-Hlder decomposition theorem 1334
p-primary component 1314 simplicity of the alternating groups 1334
proof of Frattini argument 1314 abelian groups of order 120 1337
proof of Sylow theorems 1314 fundamental theorem of finitely generated abelian
subgroups containing the normalizers of Sylow groups 1337
subgroups normalize themselves 1316 conjugacy class 1338
20D25 – Special subgroups (Frattini, Fit- Frattini subgroup 1338
ting, etc.) 1317 non-generator 1338
Fitting’s theorem 1317 20Exx – Structure and classification of in-
characteristically simple group 1317 finite or finite groups 1339
the Frattini subgroup is nilpotent 1317 faithful group action 1339
20D30 – Series and lattices of subgroups 20F18 – Nilpotent groups 1340
1319 classification of finite nilpotent groups 1340
maximal condition 1319 nilpotent group 1340
minimal condition 1319 20F22 – Other classes of groups defined by
subnormal series 1320 subgroup chains 1342
20D35 – Subnormal subgroups 1321 inverse limit 1342
subnormal subgroup 1321 20F28 – Automorphism groups of groups
20D99 – Miscellaneous 1322 1344
Cauchy’s theorem 1322 outer automorphism group 1344
Lagrange’s theorem 1322 20F36 – Braid groups; Artin groups 1345
exponent 1322 braid group 1345
fully invariant subgroup 1323 20F55 – Reflection and Coxeter groups 1347
proof of Cauchy’s theorem 1323 cycle 1347
proof of Lagrange’s theorem 1324 dihedral group 1348
proof of the converse of Lagrange’s theorem for 20F65 – Geometric group theory 1349
finite cyclic groups 1324 groups that act freely on trees are free 1349
proof that expG divides |G| 1324 20F99 – Miscellaneous 1350
proof that |g| divides expG 1325 perfect group 1350
proof that every group of prime order is cyclic 20G15 – Linear algebraic groups over ar-
1325 bitrary fields 1351
20E05 – Free nonabelian groups 1326 Nagao’s theorem 1351
Nielsen-Schreier theorem 1326 computation of the order of GL(n, Fq ) 1351
Scheier index formula 1326 general linear group 1352
free group 1326 order of the general linear group over a finite field
proof of Nielsen-Schreier theorem and Schreier 1352
index formula 1327 special linear group 1352
Jordan-Holder decomposition 1328 20G20 – Linear algebraic groups over the
profinite group 1328 reals, the complexes, the quaternions 1353
extension 1329 orthogonal group 1353
holomorph 1329 20G25 – Linear algebraic groups over local
proof of the Jordan Holder decomposition theo- fields and their integers 1354
xxix
Ihara’s theorem 1354 stabilizer 1378
20G40 – Linear algebraic groups over fi- 20M99 – Miscellaneous 1379
nite fields 1355 a semilattice is a commutative band 1379
SL2 (F3 ) 1355 adjoining an identity to a semigroup 1379
20J06 – Cohomology of groups 1356 band 1380
group cohomology 1356 bicyclic semigroup 1380
stronger Hilbert theorem 90 1357 congruence 1381
20J15 – Category of groups 1359 cyclic semigroup 1381
variety of groups 1359 idempotent 1382
20K01 – Finite abelian groups 1360 null semigroup 1383
Schinzel’s theorem 1360 semigroup 1383
20K10 – Torsion groups, primary groups semilattice 1383
and generalized primary groups 1361 subsemigroup,, submonoid,, and subgroup 1384
torsion 1361 zero elements 1384
20K25 – Direct sums, direct products, etc. 20N02 – Sets with a single binary opera-
1362 tion (groupoids) 1386
direct product of groups 1362 groupoid 1386
20K99 – Miscellaneous 1363 idempotency 1386
Klein 4-group 1363 left identity and right identity 1387
divisible group 1364 20N05 – Loops, quasigroups 1388
example of divisible group 1364 Moufang loop 1388
locally cyclic group 1364 loop and quasigroup 1389
20Kxx – Abelian groups 1366 22-00 – General reference works (hand-
abelian group 1366 books, dictionaries, bibliographies, etc.) 1390
20M10 – General structure theory 1367 fixed-point subspace 1390
existence of maximal semilattice decomposition 22-XX – Topological groups, Lie groups
1367 1391
semilattice decomposition of a semigroup 1368 Cantor space 1391
simple semigroup 1368 22A05 – Structure of general topological
20M12 – Ideal theory 1370 groups 1392
Rees factor 1370 topological group 1392
ideal 1370 22C05 – Compact groups 1393
20M14 – Commutative semigroups 1372 n-torus 1393
Archimedean semigroup 1372 reductive 1393
commutative semigroup 1372 22D05 – General properties and structure
20M20 – Semigroups of transformations, of locally compact groups 1394
etc. 1373 Γ-simple 1394
semigroup of transformations 1373 22D15 – Group algebras of locally com-
20M30 – Representation of semigroups; ac- pact groups 1395
tions of semigroups on sets 1375 group C ∗ -algebra 1395
counting theorem 1375 22E10 – General properties and structure
example of group action 1375 of complex Lie groups 1396
group action 1376 existence and uniqueness of compact real form
orbit 1377 1396
proof of counting theorem 1377 maximal torus 1397
xxx
Lie group 1397 even/odd function 1426
complexification 1399 example of chain rule 1427
Hilbert-Weyl theorem 1400 example of increasing/decreasing/monotone func-
the connection between Lie groups and Lie alge- tion 1428
bras 1401 extended mean-value theorem 1428
26-00 – General reference works (hand- increasing/decreasing/monotone function 1428
books, dictionaries, bibliographies, etc.) 1402 intermediate value theorem 1429
derivative notation 1402 limit 1429
fundamental theorems of calculus 1403 mean value theorem 1430
logarithm 1404 mean-value theorem 1430
proof of the first fundamental theorem of calcu- monotonicity criterion 1431
lus 1405 nabla 1431
proof of the second fundamental theorem of cal- one-sided limit 1432
culus 1405 product rule 1432
root-mean-square 1406 proof of Darboux’s theorem 1433
square 1406 proof of Fermat’s Theorem (stationary points)
26-XX – Real functions 1408 1434
abelian function 1408 proof of Rolle’s theorem 1434
full-width at half maximum 1408 proof of Taylor’s Theorem 1435
26A03 – Foundations: limits and general- proof of binomial formula 1436
izations, elementary topology of the line proof of chain rule 1436
1410 proof of extended mean-value theorem 1437
Cauchy sequence 1410 proof of intermediate value theorem 1437
Dedekind cuts 1410 proof of mean value theorem 1438
binomial proof of positive integer power rule 1413 proof of monotonicity criterion 1439
exponential 1414 proof of quotient rule 1439
interleave sequence 1415 quotient rule 1440
limit inferior 1415 signum function 1440
limit superior 1416 26A09 – Elementary functions 1443
power rule 1417 definitions in trigonometry 1443
properties of the exponential 1417 hyperbolic functions 1444
squeeze rule 1418 26A12 – Rate of growth of functions, or-
26A06 – One-variable calculus 1420 ders of infinity, slowly varying functions
Darboux’s theorem (analysis) 1420 1446
Fermat’s Theorem (stationary points) 1420 Landau notation 1446
Heaviside step function 1421 26A15 – Continuity and related questions
Leibniz’ rule 1421 (modulus of continuity, semicontinuity, dis-
Rolle’s theorem 1422 continuities, etc.) 1448
binomial formula 1422 Dirichlet’s function 1448
chain rule 1422 semi-continuous 1448
complex Rolle’s theorem 1423 semicontinuous 1449
complex mean-value theorem 1423 uniformly continuous 1450
definite integral 1424 26A16 – Lipschitz (Hölder) classes 1451
derivative of even/odd function (proof) 1425 Lipschitz condition 1451
direct sum of even/odd functions (example) 1425 Lipschitz condition and differentiability 1452
xxxi
Lipschitz condition and differentiability result 1453 implicit function theorem 1481
26A18 – Iteration 1454 proof of implicit function theorem 1482
iteration 1454 26B12 – Calculus of vector functions 1484
periodic point 1454 Clairaut’s theorem 1484
26A24 – Differentiation (functions of one Fubini’s Theorem 1484
variable): general theory, generalized deriva-Generalised N-dimensional Riemann Sum 1485
tives, mean-value theorems 1455 Generalized N-dimensional Riemann Integral 1485
Leibniz notation 1455 Helmholtz equation 1486
derivative 1456 Hessian matrix 1487
l’Hpital’s rule 1460 Jordan Content of an N-cell 1487
proof of De l’Hpital’s rule 1461 Laplace equation 1487
related rates 1462 chain rule (several variables) 1488
26A27 – Nondifferentiability (nondifferen- divergence 1489
tiable functions, points of nondifferentia- extremum 1490
bility), discontinuous derivatives 1464 irrotational field 1490
Weierstrass function 1464 partial derivative 1491
26A36 – Antidifferentiation 1465 plateau 1492
antiderivative 1465 proof of Green’s theorem 1492
integration by parts 1465 relations between Hessian matrix and local ex-
integrations by parts for the Lebesgue integral trema 1493
1466 solenoidal field 1494
26A42 – Integrals of Riemann, Stieltjes 26B15 – Integration: length, area, volume
and Lebesgue type 1468 1495
Riemann sum 1468 arc length 1495
Riemann-Stieltjes integral 1469 26B20 – Integral formulas (Stokes, Gauss,
continuous functions are Riemann integrable 1469 Green, etc.) 1497
generalized Riemann integral 1469 Green’s theorem 1497
proof of Continuous functions are Riemann inte- 26B25 – Convexity, generalizations 1499
grable 1470 convex function 1499
26A51 – Convexity, generalizations 1471 extremal value of convex/concave functions 1500
concave function 1471 26B30 – Absolutely continuous functions,
26Axx – Functions of one variable 1472 functions of bounded variation 1502
function centroid 1472 absolutely continuous function 1502
26B05 – Continuity and differentiation ques- total variation 1503
tions 1473 26B99 – Miscellaneous 1505
∞
C0 (U ) is not empty 1473 derivation of zeroth weighted power mean 1505
Rademacher’s Theorem 1474 weighted power mean 1506
smooth functions with compact support 1475 26C15 – Rational functions 1507
26B10 – Implicit function theorems, Jaco- rational function 1507
bians, transformations with several vari- 26C99 – Miscellaneous 1508
ables 1477 Laguerre Polynomial 1508
Jacobian matrix 1477 26D05 – Inequalities for trigonometric func-
directional derivative 1477 tions and polynomials 1509
gradient 1478 Weierstrass product inequality 1509
implicit differentiation 1481 proof of Jordan’s Inequality 1509
xxxii
26D10 – Inequalities involving derivatives Hahn-Kolmogorov theorem 1536
and differential and integral operators 1511 measure 1536
Gronwall’s lemma 1511 outer measure 1536
proof of Gronwall’s lemma 1511 properties for measure 1538
26D15 – Inequalities for sums, series and 28A12 – Contents, measures, outer mea-
integrals 1513 sures, capacities 1540
Carleman’s inequality 1513 Hahn decomposition theorem 1540
Chebyshev’s inequality 1513 Jordan decomposition 1540
MacLaurin’s Inequality 1514 Lebesgue decomposition theorem 1541
Minkowski inequality 1514 Lebesgue outer measure 1541
Muirhead’s theorem 1515 absolutely continuous 1542
Schur’s inequality 1515 counting measure 1543
Young’s inequality 1515 measurable set 1543
arithmetic-geometric-harmonic means inequality outer regular 1543
1516 signed measure 1543
general means inequality 1516 singular measure 1544
power mean 1517 28A15 – Abstract differentiation theory,
proof of Chebyshev’s inequality 1517 differentiation of set functions 1545
proof of Minkowski inequality 1518 Hardy-Littlewood maximal theorem 1545
proof of arithmetic-geometric-harmonic means in- Lebesgue differentiation theorem 1545
equality 1519 Radon-Nikodym theorem 1546
proof of general means inequality 1521 integral depending on a parameter 1547
proof of rearrangement inequality 1522 28A20 – Measurable and nonmeasurable
rearrangement inequality 1523 functions, sequences of measurable func-
26D99 – Miscellaneous 1524 tions, modes of convergence 1549
Bernoulli’s inequality 1524 Egorov’s theorem 1549
proof of Bernoulli’s inequality 1524 Fatou’s lemma 1549
26E35 – Nonstandard analysis 1526 Fatou-Lebesgue theorem 1550
hyperreal 1526 dominated convergence theorem 1550
e is not a quadratic irrational 1527 measurable function 1550
zero of a function 1528 monotone convergence theorem 1551
28-00 – General reference works (hand- proof of Egorov’s theorem 1551
books, dictionaries, bibliographies, etc.) 1530 proof of Fatou’s lemma 1552
extended real numbers 1530 proof of Fatou-Lebesgue theorem 1552
28-XX – Measure and integration 1532 proof of dominated convergence theorem 1553
Riemann integral 1532 proof of monotone convergence theorem 1553
martingale 1532 28A25 – Integration with respect to mea-
28A05 – Classes of sets (Borel fields, σ- sures and other set functions 1555
rings, etc.), measurable sets, Suslin sets, L∞ (X, dµ) 1555
analytic sets 1534 Hardy-Littlewood maximal operator 1555
Borel σ-algebra 1534 Lebesgue integral 1556
28A10 – Real- or complex-valued set func- 28A60 – Measures on Boolean rings, mea-
tions 1535 sure algebras 1558
σ-finite 1535 σ-algebra 1558
Argand diagram 1535 σ-algebra 1558
xxxiii
algebra 1559 Runge’s theorem 1582
measurable set (for outer measure) 1559 Weierstrass M-test 1583
28A75 – Length, area, volume, other geo- annulus 1583
metric measure theory 1561 conformally equivalent 1583
Lebesgue density theorem 1561 contour integral 1584
28A80 – Fractals 1562 orientation 1585
Cantor set 1562 proof of Weierstrass M-test 1585
Hausdorff dimension 1565 unit disk 1586
Koch curve 1566 upper half plane 1586
Sierpinski gasket 1567 winding number and fundamental group 1586
fractal 1567 30B10 – Power series (including lacunary
28Axx – Classical measure theory 1569 series) 1587
Vitali’s Theorem 1569 Euler relation 1587
proof of Vitali’s Theorem 1569 analytic 1588
28B15 – Set functions, measures and inte- existence of power series 1588
grals with values in ordered spaces 1571 infinitely-differentiable function that is not ana-
Lp -space 1571 lytic 1590
locally integrable function 1572 power series 1591
28C05 – Integration theory via linear func- proof of radius of convergence 1592
tionals (Radon measures, Daniell integrals, radius of convergence 1593
etc.), representing set functions and mea- 30B50 – Dirichlet series and other series
sures 1573 expansions, exponential series 1594
Haar integral 1573 Dirichlet series 1594
28C10 – Set functions and measures on 30C15 – Zeros of polynomials, rational func-
topological groups, Haar measures, invari- tions, and other analytic functions (e.g.
ant measures 1575 zeros of functions with bounded Dirichlet
Haar measure 1575 integral) 1596
28C20 – Set functions and measures and Mason-Stothers theorem 1596
zeroes of analytic functions are isolated 1596
integrals in infinite-dimensional spaces (Wiener
measure, Gaussian measure, etc.) 1577 30C20 – Conformal mappings of special
essential supremum 1577 domains 1598
28D05 – Measure-preserving transforma- automorphisms of unit disk 1598
tions 1578 unit disk upper half plane conformal equivalence
measure-preserving 1578 theorem 1598
30-00 – General reference works (hand- 30C35 – General theory of conformal map-
pings 1599
books, dictionaries, bibliographies, etc.) 1579
domain 1579 proof of conformal mapping theorem 1599
region 1579 30C80 – Maximum principle; Schwarz’s lemma,
regular region 1580 Lindelöf principle, analogues and general-
topology of the complex plane 1580 izations; subordination 1601
30-XX – Functions of a complex variable Schwarz lemma 1601
1581 maximum principle 1601
z0 is a pole of f 1581 proof of Schwarz lemma 1602
30A99 – Miscellaneous 1582 30D20 – Entire functions, general theory
Riemann mapping theorem 1582 1603
xxxiv
Liouville’s theorem 1603 proof of Möbius circle transformation theorem
Morera’s theorem 1603 1622
entire 1604 proof of Simultaneous converging or diverging of
holomorphic 1604 product and sum theorem 1623
proof of Liouville’s theorem 1604 proof of absolute convergence implies convergence
30D30 – Meromorphic functions, general for an infinite product 1624
theory 1606 proof of closed curve theorem 1624
Casorati-Weierstrass theorem 1606 proof of conformal Möbius circle map theorem
Mittag-Leffler’s theorem 1606 1624
Riemann’s removable singularity theorem 1607 simultaneous converging or diverging of product
essential singularity 1607 and sum theorem 1625
meromorphic 1607 Cauchy-Riemann equations 1625
pole 1607 Cauchy-Riemann equations (polar coordinates)
proof of Casorati-Weierstrass theorem 1608 1626
proof of Riemann’s removable singularity theo- proof of the Cauchy-Riemann equations 1626
rem 1608 removable singularity 1627
residue 1609 30F40 – Kleinian groups 1629
simple pole 1610 Klein 4-group 1629
30E20 – Integration, integrals of Cauchy 31A05 – Harmonic, subharmonic, super-
type, integral representations of analytic harmonic functions 1630
functions 1611 a harmonic function on a graph which is bounded
Cauchy integral formula 1611 below and nonconstant 1630
Cauchy integral theorem 1612 example of harmonic functions on graphs 1630
Cauchy residue theorem 1613 examples of harmonic functions on Rn 1631
Gauss’ mean value theorem 1614 harmonic function 1632
Möbius circle transformation theorem 1614 31B05 – Harmonic, subharmonic, super-
Möbius transformation cross-ratio preservation harmonic functions 1633
theorem 1614 Laplacian 1633
Rouch’s theorem 1614 32A05 – Power series, series of functions
absolute convergence implies convergence for an 1634
infinite product 1615 exponential function 1634
absolute convergence of infinite product 1615 32C15 – Complex spaces 1637
closed curve theorem 1615 Riemann sphere 1637
conformal Möbius circle map theorem 1615 32F99 – Miscellaneous 1638
conformal mapping 1616 star-shaped region 1638
conformal mapping theorem 1616 32H02 – Holomorphic mappings, (holomor-
convergence/divergence for an infinite product phic) embeddings and related questions 1639
1616 Bloch’s theorem 1639
example of conformal mapping 1616 Hartog’s theorem 1639
examples of infinite products 1617 32H25 – Picard-type theorems and gener-
link between infinite products and sums 1617 alizations 1640
proof of Cauchy integral formula 1618 Picard’s theorem 1640
proof of Cauchy residue theorem 1619 little Picard theorem 1640
proof of Gauss’ mean value theorem 1620 33-XX – Special functions 1641
proof of Goursat’s theorem 1620 beta function 1641
xxxv
33B10 – Exponential and trigonometric func-34A05 – Explicit solutions and reductions
tions 1642 1669
natural logarithm 1642 separation of variables 1669
33B15 – Gamma, beta and polygamma func- variation of parameters 1670
tions 1643 34A12 – Initial value problems, existence,
Bohr-Mollerup theorem 1643 uniqueness, continuous dependence and con-
gamma function 1643 tinuation of solutions 1672
proof of Bohr-Mollerup theorem 1645 initial value problem 1672
33B30 – Higher logarithm functions 1647 34A30 – Linear equations and systems, gen-
Lambert W function 1647 eral 1674
33B99 – Miscellaneous 1648 Chebyshev equation 1674
natural log base 1648 34A99 – Miscellaneous 1676
33D45 – Basic orthogonal polynomials and autonomous system 1676
functions (Askey-Wilson polynomials, etc.) 34B24 – Sturm-Liouville theory 1677
1649 eigenfunction 1677
orthogonal polynomials 1649 34C05 – Location of integral curves, sin-
33E05 – Elliptic functions and integrals gular points, limit cycles 1678
1651 Hopf bifurcation theorem 1678
Weierstrass sigma function 1651 Poincare-Bendixson theorem 1679
elliptic function 1652 omega limit set 1679
elliptic integrals and Jacobi elliptic functions 1652 34C07 – Theory of limit cycles of polyno-
examples of elliptic functions 1654 mial and analytic vector fields (existence,
modular discriminant 1654 uniqueness, bounds, Hilbert’s 16th prob-
34-00 – General reference works (hand- lem and ramif 1680
books, dictionaries, bibliographies, etc.) 1656 Hilbert’s 16th problem for quadratic vector fields
Liapunov function 1656 1680
Lorenz equation 1657 34C23 – Bifurcation 1682
Wronskian determinant 1659 equivariant branching lemma 1682
dependence on initial conditions of solutions of 34C25 – Periodic solutions 1683
ordinary differential equations 1660 Bendixson’s negative criterion 1683
differential equation 1661 Dulac’s criteria 1683
existence and uniqueness of solution of ordinary proof of Bendixson’s negative criterion 1684
differential equations 1662 34C99 – Miscellaneous 1685
maximal interval of existence of ordinary differ- Hartman-Grobman theorem 1685
ential equations 1663 equilibrium point 1685
method of undetermined coefficients 1663 stable manifold theorem 1686
natural symmetry of the Lorenz equation 1664 34D20 – Lyapunov stability 1687
symmetry of a solution of an ordinary differen- Lyapunov stable 1687
tial equation 1665 neutrally stable fixed point 1687
symmetry of an ordinary differential equation stable fixed point 1687
1665 34L05 – General spectral theory 1688
34-01 – Instructional exposition (textbooks, Gelfand spectral radius theorem 1688
tutorial papers, etc.) 1667 34L15 – Estimation of eigenvalues, upper
second order linear differential equation with con- and lower bounds 1689
stant coefficients 1667 Rayleigh quotient 1689
xxxvi
34L40 – Particular operators (Dirac, one- 37C20 – Generic properties, structural sta-
dimensional Schrödinger, etc.) 1690 bility 1712
Dirac delta function 1690 Kupka-Smale theorem 1712
construction of Dirac delta function 1691 Pugh’s general density theorem 1712
35-00 – General reference works (hand- structural stability 1713
books, dictionaries, bibliographies, etc.) 1692 37C25 – Fixed points, periodic points, fixed-
differential operator 1692 point index theory 1714
35J05 – Laplace equation, reduced wave hyperbolic fixed point 1714
equation (Helmholtz), Poisson equation 169437C29 – Homoclinic and heteroclinic or-
Poisson’s equation 1694 bits 1715
35L05 – Wave equation 1695 heteroclinic 1715
wave equation 1695 homoclinic 1715
35Q53 – KdV-like equations (Korteweg-de 37C75 – Stability theory 1716
Vries, Burgers, sine-Gordon, sinh-Gordon, attracting fixed point 1716
etc.) 1697 stable manifold 1716
Korteweg - de Vries equation 1697 37C80 – Symmetries, equivariant dynam-
35Q99 – Miscellaneous 1698 ical systems 1718
heat equation 1698 Γ-equivariant 1718
37-00 – General reference works (hand- 37D05 – Hyperbolic orbits and sets 1719
books, dictionaries, bibliographies, etc.) 1699 hyperbolic isomorphism 1719
37A30 – Ergodic theorems, spectral the- 37D20 – Uniformly hyperbolic systems (ex-
ory, Markov operators 1700 panding, Anosov, Axiom A, etc.) 1720
ergodic 1700 Anosov diffeomorphism 1720
fundamental theorem of demography 1700 Axiom A 1721
proof of fundamental theorem of demography 1701 hyperbolic set 1721
37B05 – Transformations and group ac- 37D99 – Miscellaneous 1722
tions with special properties (minimality, Kupka-Smale 1722
distality, proximality, etc.) 1703 37E05 – Maps of the interval (piecewise
discontinuous action 1703 continuous, continuous, smooth) 1723
37B20 – Notions of recurrence 1704 Sharkovskii’s theorem 1723
nonwandering set 1704 37G15 – Bifurcations of limit cycles and
37B99 – Miscellaneous 1705 periodic orbits 1724
ω-limit set 1705 Feigenbaum constant 1724
asymptotically stable 1706 Feigenbaum fractal 1725
expansive 1706 equivariant Hopf theorem 1726
the only compact metric spaces that admit a pos- 37G40 – Symmetries, equivariant bifurca-
tion theory 1728
itively expansive homeomorphism are discrete spaces
1707 Poénaru (1976) theorem 1728
topological conjugation 1708 bifurcation problem with symmetry group 1728
topologically transitive 1709 trace formula 1729
uniform expansivity 1709 37G99 – Miscellaneous 1730
37C10 – Vector fields, flows, ordinary dif- chaotic dynamical system 1730
ferential equations 1710 37H20 – Bifurcation theory 1732
flow 1710 bifurcation 1732
globally attracting fixed point 1711 39B05 – General 1733
xxxvii
functional equation 1733 proof of Abel’s test for convergence 1754
39B62 – Functional inequalities, including proof of Bolzano-Weierstrass Theorem 1754
subadditivity, convexity, etc. 1734 proof of Cauchy’s root test 1756
Jensen’s inequality 1734 proof of Leibniz’s theorem (using Dirichlet’s con-
proof of Jensen’s inequality 1735 vergence test) 1756
proof of arithmetic-geometric-harmonic means in- proof of absolute convergence theorem 1756
equality 1735 proof of alternating series test 1757
subadditivity 1736 proof of comparison test 1757
superadditivity 1736 proof of integral test 1758
40-00 – General reference works (hand- proof of ratio test 1759
books, dictionaries, bibliographies, etc.) 1738 ratio test 1759
Cauchy product 1738 40A10 – Convergence and divergence of
Cesro mean 1739 integrals 1760
alternating series 1739 improper integral 1760
alternating series test 1739 40A25 – Approximation to limiting values
monotonic 1740 (summation of series, etc.) 1761
monotonically decreasing 1740 Euler’s constant 1761
monotonically increasing 1741 40A30 – Convergence and divergence of
monotonically nondecreasing 1741 series and sequences of functions 1763
monotonically nonincreasing 1741 Abel’s limit theorem 1763
sequence 1742 Löwner partial ordering 1763
series 1742 Löwner’s theorem 1764
40A05 – Convergence and divergence of matrix monotone 1764
series and sequences 1743 operator monotone 1764
Abel’s lemma 1743 pointwise convergence 1764
Abel’s test for convergence 1744 uniform convergence 1765
Baroni’s Theorem 1744 40G05 – Cesàro, Euler, Nörlund and Haus-
Bolzano-Weierstrass theorem 1744 dorff methods 1766
Cauchy criterion for convergence 1744 Cesàro summability 1766
Cauchy’s root test 1745 40G10 – Abel, Borel and power series meth-
Dirichlet’s convergence test 1745 ods 1768
Proof of Baroni’s Theorem 1746 Abel summability 1768
Proof of Stolz-Cesaro theorem 1747 proof of Abel’s convergence theorem 1769
Stolz-Cesaro theorem 1748 proof of Tauber’s convergence theorem 1770
absolute convergence theorem 1748 41A05 – Interpolation 1772
comparison test 1748 Lagrange Interpolation formula 1772
convergent sequence 1749 Simpson’s 3/8 rule 1772
convergent series 1749 trapezoidal rule 1773
determining series convergence 1749 41A25 – Rate of convergence, degree of
example of integral test 1750 approximation 1775
geometric series 1750 superconvergence 1775
harmonic number 1751 41A58 – Series expansions (e.g. Taylor,
harmonic series 1752 Lidstone series, but not Fourier series) 1776
integral test 1753 Taylor series 1776
proof of Abel’s lemma (by induction) 1754 Taylor’s Theorem 1778
xxxviii
41A60 – Asymptotic approximations, asymp-symmetric set 1803
totic expansions (steepest descent, etc.) 46A30 – Open mapping and closed graph
1779 theorems; completeness (including B-, Br -
Stirling’s approximation 1779 completeness) 1805
42-00 – General reference works (hand- closed graph theorem 1805
books, dictionaries, bibliographies, etc.) 1781 open mapping theorem 1805
countable basis 1781 46A99 – Miscellaneous 1806
discrete cosine transform 1782 Heine-Cantor theorem 1806
42-01 – Instructional exposition (textbooks, proof of Heine-Cantor theorem 1806
tutorial papers, etc.) 1784 topological vector space 1807
Laplace transform 1784 46B20 – Geometry and structure of normed
42A05 – Trigonometric polynomials, in- linear spaces 1808
equalities, extremal problems 1785 limp→∞ kxkp = kxk∞ 1808
Chebyshev polynomial 1785 Hahn-Banach theorem 1809
42A16 – Fourier coefficients, Fourier se- proof of Hahn-Banach theorem 1810
ries of functions with special properties, seminorm 1811
special Fourier series 1787 vector norm 1813
Riemann-Lebesgue lemma 1787 46B50 – Compactness in Banach (or normed)
example of Fourier series 1788 spaces 1815
42A20 – Convergence and absolute con- Schauder fixed point theorem 1815
vergence of Fourier and trigonometric se- proof of Schauder fixed point theorem 1815
ries 1789 46B99 – Miscellaneous 1817
Dirichlet conditions 1789 `p 1817
42A38 – Fourier and Fourier-Stieltjes trans- Banach space 1818
forms and other transforms of Fourier type an inner product defines a norm 1818
1790 continuous linear mapping 1818
Fourier transform 1790 equivalent norms 1819
42A99 – Miscellaneous 1792 normed vector space 1820
Poisson summation formula 1792 46Bxx – Normed linear spaces and Banach
42B05 – Fourier series and coefficients 1793 spaces; Banach lattices 1821
Parseval equality 1793 vector p-norm 1821
Wirtinger’s inequality 1793 46C05 – Hilbert and pre-Hilbert spaces:
43A07 – Means on groups, semigroups, geometry and topology (including spaces
etc.; amenable groups 1795 with semidefinite inner product) 1822
amenable group 1795 Bessel inequality 1822
44A35 – Convolution 1796 Hilbert module 1822
convolution 1796 Hilbert space 1823
46-00 – General reference works (hand- proof of Bessel inequality 1823
books, dictionaries, bibliographies, etc.) 1799 46C15 – Characterizations of Hilbert spaces
balanced set 1799 1825
bounded function 1800 classification of separable Hilbert spaces 1825
bounded set (in a topological vector space) 1801 46E15 – Banach spaces of continuous, dif-
cone 1802 ferentiable or analytic functions 1826
locally convex topological vector space 1803 Ascoli-Arzela theorem 1826
sequential characterization of boundedness 1803 Stone-Weierstrass theorem 1826
xxxix
proof of Ascoli-Arzel theorem 1827 47A53 – (Semi-) Fredholm operators; in-
Holder inequality 1827 dex theories 1853
Young Inequality 1828 Fredholm index 1853
conjugate index 1828 Fredholm operator 1853
proof of Holder inequality 1828 47A56 – Functions whose values are lin-
proof of Young Inequality 1829 ear operators (operator and matrix val-
vector field 1829 ued functions, etc., including analytic and
46F05 – Topological linear spaces of test meromorphic ones 1855
functions, distributions and ultradistribu- Taylor’s formula for matrix functions 1855
tions 1830 47A60 – Functional calculus 1856
Tf is a distribution of zeroth order 1830 Beltrami identity 1856
p.v.( x1 ) is a distribution of first order 1831 Euler-Lagrange differential equation 1857
Cauchy principal part integral 1832 calculus of variations 1857
delta distribution 1833 47B15 – Hermitian and normal operators
distribution 1833 (spectral measures, functional calculus, etc.)
equivalence of conditions 1835 1862
every locally integrable function is a distribution self-adjoint operator 1862
1836 47G30 – Pseudodifferential operators 1863
localization for distributions 1836 Dini derivative 1863
operations on distributions 1837 47H10 – Fixed-point theorems 1864
smooth distribution 1839 Brouwer fixed point in one dimension 1864
space of rapidly decreasing functions 1840 Brouwer fixed point theorem 1865
support of distribution 1841 any topological space with the fixed point prop-
46H05 – General theory of topological al- erty is connected 1865
gebras 1843 fixed point property 1866
Banach algebra 1843 proof of Brouwer fixed point theorem 1867
46L05 – General theory of C ∗ -algebras 1844 47L07 – Convex sets and cones of opera-
C ∗ -algebra 1844 tors 1868
Gelfand-Naimark representation theorem 1844 convex hull of S is open if S is open 1868
state 1844 47L25 – Operator spaces (= matricially
46L85 – Noncommutative topology 1846 normed spaces) 1869
Gelfand-Naimark theorem 1846 operator norm 1869
Serre-Swan theorem 1846 47S99 – Miscellaneous 1870
46T12 – Measure (Gaussian, cylindrical, Drazin inverse 1870
etc.) and integrals (Feynman, path, Fres- 49K10 – Free problems in two or more in-
nel, etc.) on manifolds 1847 dependent variables 1871
path integral 1847 Kantorovitch’s theorem 1871
47A05 – General (adjoints, conjugates, prod-49M15 – Methods of Newton-Raphson, Galerkin
ucts, inverses, domains, ranges, etc.) 1849 and Ritz types 1873
Baker-Campbell-Hausdorff formula(e) 1849 Newton’s method 1873
adjoint 1850 51-00 – General reference works (hand-
closed operator 1850 books, dictionaries, bibliographies, etc.) 1877
properties of the adjoint operator 1851 Apollonius theorem 1877
47A35 – Ergodic theory 1852 Apollonius’ circle 1877
ergodic theorem 1852 Brahmagupta’s formula 1878
xl
Brianchon theorem 1878 isosceles triangle 1899
Brocard theorem 1878 legs 1899
Carnot circles 1879 medial triangle 1899
Erdös-Anning Theorem 1879 median 1900
Euler Line 1879 midpoint 1900
Gergonne point 1879 nine-point circle 1900
Gergonne triangle 1880 orthic triangle 1901
Heron’s formula 1880 orthocenter 1901
Lemoine circle 1880 parallelogram 1902
Lemoine point 1880 parallelogram law 1902
Miquel point 1881 pedal triangle 1902
Mollweide’s equations 1881 pentagon 1903
Morley’s theorem 1881 polygon 1903
Newton’s line 1882 proof of Apollonius theorem 1904
Newton-Gauss line 1882 proof of Apollonius theorem 1904
Pascal’s mystic hexagram 1882 proof of Brahmagupta’s formula 1905
Ptolemy’s theorem 1882 proof of Erdös-Anning Theorem 1906
Pythagorean theorem 1883 proof of Heron’s formula 1906
Schooten theorem 1883 proof of Mollweide’s equations 1907
Simson’s line 1884 proof of Ptolemy’s inequality 1908
Stewart’s theorem 1884 proof of Ptolemy’s theorem 1909
Thales’ theorem 1884 proof of Pythagorean theorem 1910
alternate proof of parallelogram law 1885 proof of Pythagorean theorem 1910
alternative proof of the sines law 1885 proof of Simson’s line 1911
angle bisector 1887 proof of Stewart’s theorem 1912
angle sum identity 1888 proof of Thales’ theorem 1913
annulus 1889 proof of butterfly theorem 1913
butterfly theorem 1889 proof of double angle identity 1914
centroid 1889 proof of parallelogram law 1915
chord 1890 proof of tangents law 1915
circle 1890 quadrilateral 1916
collinear 1893 radius 1916
complete quadrilateral 1893 rectangle 1916
concurrent 1893 regular polygon 1917
cosines law 1894 regular polyhedron 1917
cyclic quadrilateral 1894 rhombus 1918
derivation of cosines law 1894 right triangle 1919
diameter 1895 sector of a circle 1919
double angle identity 1896 sines law 1919
equilateral triangle 1896 sines law proof 1920
fundamental theorem on isogonal lines 1897 some proofs for triangle theorems 1920
height 1897 square 1921
hexagon 1897 tangents law 1921
hypotenuse 1898 triangle 1921
isogonal conjugate 1898 triangle center 1922
xli
51-01 – Instructional exposition (textbooks, isoperimetric inequality 1951
tutorial papers, etc.) 1924 proof of Hadwiger-Finsler inequality 1951
geometry 1924 51M20 – Polyhedra and polytopes; regu-
51-XX – Geometry 1927 lar figures, division of spaces 1953
non-Euclidean geometry 1927 polyhedron 1953
parallel postulate 1927 51M99 – Miscellaneous 1954
51A05 – General theory and projective ge- Euler line proof 1954
ometries 1928 SSA 1954
Ceva’s theorem 1928 cevian 1955
Menelaus’ theorem 1928 congruence 1955
Pappus’s theorem 1929 incenter 1956
proof of Ceva’s theorem 1929 incircle 1956
proof of Menelaus’ theorem 1930 symmedian 1957
proof of Pappus’s theorem 1931 51N05 – Descriptive geometry 1958
proof of Pascal’s mystic hexagram 1932 curve 1958
51A30 – Desarguesian and Pappian geome- piecewise smooth 1960
tries 1934 rectifiable 1960
Desargues’ theorem 1934 51N20 – Euclidean analytic geometry 1961
proof of Desargues’ theorem 1934 Steiner’s theorem 1961
51A99 – Miscellaneous 1936 Van Aubel theorem 1961
Pick’s theorem 1936 conic section 1961
proof of Pick’s theorem 1936 proof of Steiner’s theorem 1963
51F99 – Miscellaneous 1939 proof of Van Aubel theorem 1964
Weizenbock’s Inequality 1939 proof of Van Aubel’s Theorem 1965
51M04 – Elementary problems in Euclidean three theorems on parabolas 1966
geometries 1940 52A01 – Axiomatic and generalized con-
Napoleon’s theorem 1940 vexity 1969
corollary of Morley’s theorem 1941 convex combination 1969
pivot theorem 1941 52A07 – Convex sets in topological vector
proof of Morley’s theorem 1941 spaces 1970
proof of pivot theorem 1943 Fréchet space 1970
51M05 – Euclidean geometries (general) 52A20 – Convex sets in n dimensions (in-
and generalizations 1944 cluding convex hypersurfaces) 1973
area of the n-sphere 1944 Carathéodory’s theorem 1973
geometry of the sphere 1945 52A35 – Helly-type theorems and geomet-
sphere 1945 ric transversal theory 1974
spherical coordinates 1947 Helly’s theorem 1974
volume of the n-sphere 1947 52A99 – Miscellaneous 1975
51M10 – Hyperbolic and elliptic geome- convex set 1975
tries (general) and generalizations 1949 52C07 – Lattices and convex bodies in n
Lobachevsky’s formula 1949 dimensions 1976
51M16 – Inequalities and extremum prob- Radon’s lemma 1976
lems 1950 52C35 – Arrangements of points, flats, hy-
Brunn-Minkowski inequality 1950 perplanes 1978
Hadwiger-Finsler inequality 1950 Sylvester’s theorem 1978
xlii
53-00 – General reference works (hand- symplectic manifold 2009
books, dictionaries, bibliographies, etc.) 1979 symplectic matrix 2009
Lie derivative 1979 symplectic vector field 2010
closed differential forms on a simple connected symplectic vector space 2010
domain 1979 53D10 – Contact manifolds, general 2011
exact (differential form) 1980 contact manifold 2011
manifold 1980 53D20 – Momentum maps; symplectic re-
metric tensor 1983 duction 2012
proof of closed differential forms on a simple con- momentum map 2012
nected domain 1983 54-00 – General reference works (hand-
pullback of a k-form 1985 books, dictionaries, bibliographies, etc.) 2013
tangent space 1985 Krull dimension 2013
53-01 – Instructional exposition (textbooks, Niemytzki plane 2013
tutorial papers, etc.) 1988 Sorgenfrey line 2014
curl 1988 boundary (in topology) 2014
53A04 – Curves in Euclidean space 1990 closed set 2014
Frenet frame 1990 coarser 2015
Serret-Frenet equations 1991 compact-open topology 2015
curvature (space curve) 1992 completely normal 2015
fundamental theorem of space curves 1993 continuous proper map 2016
helix 1993 derived set 2016
space curve 1994 diameter 2016
53A45 – Vector and tensor analysis 1996 every second countable space is separable 2016
closed (differential form) 1996 first axiom of countability 2017
53B05 – Linear and affine connections 1997 homotopy groups 2017
Levi-Civita connection 1997 indiscrete topology 2018
connection 1997 interior 2018
vector field along a curve 2001 invariant forms on representations of compact
53B21 – Methods of Riemannian geome- groups 2018
try 2002 ladder connected 2019
Hodge star operator 2002 local base 2020
Riemannian manifold 2002 loop 2020
53B99 – Miscellaneous 2004 loop space 2020
germ of smooth functions 2004 metrizable 2020
53C17 – Sub-Riemannian geometry 2005 neighborhood system 2021
Sub-Riemannian manifold 2005 paracompact topological space 2021
53D05 – Symplectic manifolds, general 2006 pointed topological space 2021
Darboux’s Theorem (symplectic geometry) 2006 proper map 2021
Moser’s theorem 2006 quasi-compact 2022
almost complex structure 2007 regularly open 2022
coadjoint orbit 2007 separated 2022
examples of symplectic manifolds 2007 support of function 2023
hamiltonian vector field 2008 topological invariant 2024
isotropic submanifold 2008 topological space 2024
lagrangian submanifold 2009 topology 2025
xliii
triangle inequality 2025 2045
universal covering space 2026 Klein bottle 2045
54A05 – Topological spaces and general- Möbius strip 2046
izations (closure spaces, etc.) 2027 cell attachment 2047
quotient space 2047
characterization of connected compact metric spaces.
2027 torus 2048
closure axioms 2027 54B17 – Adjunction spaces and similar con-
neighborhood 2028 structions 2049
open set 2028 adjunction space 2049
54A20 – Convergence in general topology 54B40 – Presheaves and sheaves 2050
(sequences, filters, limits, convergence spaces, direct image 2050
etc.) 2030 54B99 – Miscellaneous 2051
Banach fixed point theorem 2030 cofinite and cocountable topology 2051
Dini’s theorem 2031 cone 2051
another proof of Dini’s theorem 2031 join 2052
continuous convergence 2032 order topology 2052
contractive maps are uniformly continuous 2033 suspension 2053
net 2033 54C05 – Continuous maps 2054
proof of Banach fixed point theorem 2034 Inverse Function Theorem (topological spaces)
proof of Dini’s theorem 2035 2054
theorem about continuous convergence 2035 continuity of composition of functions 2054
ultrafilter 2035 continuous 2055
ultranet 2036 discontinuous 2055
54A99 – Miscellaneous 2037 homeomorphism 2057
basis 2037 proof of Inverse Function Theorem (topological
box topology 2037 spaces) 2057
closure 2038 restriction of a continuous mapping is continu-
cover 2038 ous 2057
dense 2039 54C10 – Special maps on topological spaces
examples of filters 2039 (open, closed, perfect, etc.) 2059
filter 2039 densely defined 2059
limit point 2040 open mapping 2059
nowhere dense 2040 54C15 – Retraction 2060
perfect set 2041 retract 2060
properties of the closure operator 2041 54C70 – Entropy 2061
subbasis 2041 differential entropy 2061
54B05 – Subspaces 2042 54C99 – Miscellaneous 2062
irreducible 2042 Borsuk-Ulam theorem 2062
irreducible component 2042 ham sandwich theorem 2062
subspace topology 2042 proof of Borsuk-Ulam theorem 2062
54B10 – Product spaces 2043 54D05 – Connected and locally connected
product topology 2043 spaces (general aspects) 2064
product topology preserves the Hausdorff prop- Jordan curve theorem 2064
erty 2044 clopen subset 2064
54B15 – Quotient spaces, decompositions connected component 2065
xliv
connected set 2065 Lindelöf 2081
connected set in a topological space 2066 countably compact 2081
connected space 2066 locally finite 2081
connectedness is preserved under a continuous 54D30 – Compactness 2082
map 2066 Y is compact if and only if every open cover of
cut-point 2067 Y has a finite subcover 2082
example of a connected space that is not path- Heine-Borel theorem 2083
connected 2067 Tychonoff’s theorem 2083
example of a semilocally simply connected space a space is compact if and only if the space has
which is not locally simply connected 2068 the finite intersection property 2083
example of a space that is not semilocally simply closed set in a compact space is compact 2084
connected 2068 closed subsets of a compact set are compact 2084
locally connected 2069 compact 2085
locally simply connected 2069 compactness is preserved under a continuous map
path component 2069 2085
path connected 2070 examples of compact spaces 2086
products of connected spaces 2070 finite intersection property 2088
proof that a path connected space is connected limit point compact 2088
2070 point and a compact set in a Hausdorff space
quasicomponent 2070 have disjoint open neighborhoods. 2088
semilocally simply connected 2071 proof of Heine-Borel theorem 2089
54D10 – Lower separation axioms (T0 –T3 , properties of compact spaces 2091
etc.) 2072 relatively compact 2092
T0 space 2072 sequentially compact 2092
T1 space 2072 two disjoint compact sets in a Hausdorff space
T2 space 2072 have disjoint open neighborhoods. 2092
T3 space 2073 54D35 – Extensions of spaces (compacti-
a compact set in a Hausdorff space is closed 2073 fications, supercompactifications, comple-
proof of A compact set in a Hausdorff space is tions, etc.) 2094
closed 2074 Alexandrov one-point compactification 2094
regular 2074 compactification 2094
regular space 2074 54D45 – Local compactness, σ-compactness
separation axioms 2075 2095
topological space is T1 if and only if every sin- σ-compact 2095
gleton is closed. 2076 examples of locally compact and not locally com-
54D15 – Higher separation axioms (com- pact spaces 2095
pletely regular, normal, perfectly or col- locally compact 2096
lectionwise normal, etc.) 2077 54D65 – Separability 2097
Tietze extension theorem 2077 separable 2097
Tychonoff 2077 54D70 – Base properties 2098
Urysohn’s lemma 2078 second countable 2098
normal 2078 54D99 – Miscellaneous 2099
proof of Urysohn’s lemma 2078 Lindelöf theorem 2099
54D20 – Noncompact covering properties first countable 2099
(paracompact, Lindelöf, etc.) 2081 proof of Lindelöf theorem 2099
xlv
totally disconnected space 2100 extremally disconnected 2117
54E15 – Uniform structures and general- 54G20 – Counterexamples 2118
izations 2101 Sierpinski space 2118
topology induced by uniform structure 2101 long line 2118
uniform space 2101 55-00 – General reference works (hand-
uniform structure of a metric space 2102 books, dictionaries, bibliographies, etc.) 2120
uniform structure of a topological group 2102 Universal Coefficient Theorem 2120
ε-net 2103 invariance of dimension 2121
Euclidean distance 2103 55M05 – Duality 2122
Hausdorff metric 2104 Poincaré duality 2122
Urysohn metrization theorem 2104 55M20 – Fixed points and coincidences 2123
ball 2104 Sperner’s lemma 2123
bounded 2105 55M25 – Degree, winding number 2125
city-block metric 2105 degree (map of spheres) 2125
completely metrizable 2105 winding number 2126
distance to a set 2106 55M99 – Miscellaneous 2127
equibounded 2106 genus of topological surface 2127
isometry 2106 55N10 – Singular theory 2128
metric space 2107 Betti number 2128
non-reversible metric 2107 Mayer-Vietoris sequence 2128
open ball 2108 cellular homology 2128
some structures on Rn 2108 homology (topological space) 2129
totally bounded 2110 homology of RP3 . 2131
ultrametric 2110 long exact sequence (of homology groups) 2132
Lebesgue number lemma 2111 relative homology groups 2133
proof of Lebesgue number lemma 2111 55N99 – Miscellaneous 2134
complete 2111 suspension isomorphism 2134
completeness principle 2112 55P05 – Homotopy extension properties,
uniformly equicontinuous 2112 cofibrations 2135
Baire category theorem 2112 cofibration 2135
Baire space 2113 homotopy extension property 2135
equivalent statement of Baire category theorem 55P10 – Homotopy equivalences 2136
2113 Whitehead theorem 2136
generic 2114 weak homotopy equivalence 2136
meager 2114 55P15 – Classification of homotopy type
proof for one equivalent statement of Baire cat- 2137
egory theorem 2114 simply connected 2137
proof of Baire category theorem 2115 55P20 – Eilenberg-Mac Lane spaces 2138
residual 2115 Eilenberg-Mac Lane space 2138
six consequences of Baire category theorem 2116 55P99 – Miscellaneous 2139
Hahn-Mazurkiewicz theorem 2116 fundamental groupoid 2139
Vitali covering 2116 55Pxx – Homotopy theory 2141
compactly generated 2116 nulhomotopic map 2141
54G05 – Extremally disconnected spaces, 55Q05 – Homotopy groups, general; sets
F -spaces, etc. 2117 of homotopy classes 2142
xlvi
Van Kampen’s theorem 2142 57M99 – Miscellaneous 2174
category of pointed topological spaces 2143 Dehn surgery 2174
deformation retraction 2143 57N16 – Geometric structures on mani-
fundamental group 2144 folds 2175
homotopy of maps 2144 self-intersections of a curve 2175
homotopy of paths 2145 57N70 – Cobordism and concordance 2176
long exact sequence (locally trivial bundle) 2145 h-cobordism 2176
55Q52 – Homotopy groups of special spaces Smale’s h-cobordism theorem 2176
2146 cobordism 2176
contractible 2146 57N99 – Miscellaneous 2178
55R05 – Fiber spaces 2147 orientation 2178
classification of covering spaces 2147 57R22 – Topology of vector bundles and
covering space 2148 fiber bundles 2180
deck transformation 2148 hairy ball theorem 2180
lifting of maps 2150 57R35 – Differentiable mappings 2182
lifting theorem 2151 Sard’s theorem 2182
monodromy 2151 differentiable function 2182
properly discontinuous action 2153 57R42 – Immersions 2184
regular covering 2153 immersion 2184
55R10 – Fiber bundles 2155 57R60 – Homotopy spheres, Poincaré con-
associated bundle construction 2155 jecture 2185
bundle map 2156 Poincaré conjecture 2185
fiber bundle 2156 The Poincaré dodecahedral space 2185
locally trivial bundle 2157 homology sphere 2186
principal bundle 2157 57R99 – Miscellaneous 2187
pullback bundle 2158 transversality 2187
reduction of structure group 2158 57S25 – Groups acting on specific mani-
section of a fiber bundle 2160 folds 2189
some examples of universal bundles 2161 Isomorphism of the group P SL2 (C) with the
universal bundle 2161 group of Mobius transformations 2189
55R25 – Sphere bundles and vector bun- 58A05 – Differentiable manifolds, founda-
dles 2163 tions 2190
Hopf bundle 2163 partition of unity 2190
vector bundle 2163 58A10 – Differential forms 2191
55U10 – Simplicial sets and complexes 2164 differential form 2191
simplicial complex 2164 58A32 – Natural bundles 2194
57-00 – General reference works (hand- conormal bundle 2194
books, dictionaries, bibliographies, etc.) 2167 cotangent bundle 2194
connected sum 2167 normal bundle 2195
57-XX – Manifolds and cell complexes 2168 tangent bundle 2195
CW complex 2168 58C35 – Integration on manifolds; mea-
57M25 – Knots and links in S 2170 3 sures on manifolds 2196
connected sum 2170 general Stokes theorem 2196
knot theory 2170 proof of general Stokes theorem 2196
unknot 2173 58C40 – Spectral theory; eigenvalue prob-
xlvii
lems 2199 binomial distribution 2219
spectral radius 2199 convergence in distribution 2220
58E05 – Abstract critical point theory (Morse density function 2221
theory, Ljusternik-Schnirelman (Lyusternik-distribution function 2221
Shnirelman) theory, etc.) 2200 geometric distribution 2222
Morse complex 2200 relative entropy 2223
Morse function 2200 Paul Lévy continuity theorem 2224
Morse lemma 2201 characteristic function 2225
centralizer 2201 Kolmogorov’s inequality 2226
60-00 – General reference works (hand- discrete density function 2226
books, dictionaries, bibliographies, etc.) 2202 probability distribution function 2227
Bayes’ theorem 2202 60F05 – Central limit and other weak the-
Bernoulli random variable 2202 orems 2229
Gamma random variable 2203 Lindeberg’s central limit theorem 2229
beta random variable 2204 60F15 – Strong theorems 2231
chi-squared random variable 2205 Kolmogorov’s strong law of large numbers 2231
continuous density function 2205 strong law of large numbers 2231
expected value 2206 60G05 – Foundations of stochastic processes
geometric random variable 2207 2233
proof of Bayes’ Theorem 2207 stochastic process 2233
random variable 2208 60G99 – Miscellaneous 2234
uniform (continuous) random variable 2208 stochastic matrix 2234
uniform (discrete) random variable 2209 60J10 – Markov chains with discrete pa-
60A05 – Axioms; other general questions rameter 2235
2210 Markov chain 2235
example of pairwise independent events that are 62-00 – General reference works (hand-
not totally independent 2210 books, dictionaries, bibliographies, etc.) 2236
independent 2210 covariance 2236
random event 2211 moment 2237
60A10 – Probabilistic measure theory 2212 variance 2237
Cauchy random variable 2212 62E15 – Exact distribution theory 2239
almost surely 2212 Pareto random variable 2239
60A99 – Miscellaneous 2214 exponential random variable 2240
Borel-Cantelli lemma 2214 hypergeometric random variable 2240
Chebyshev’s inequality 2214 negative hypergeometric random variable 2241
Markov’s inequality 2215 negative hypergeometric random variable, exam-
cumulative distribution function 2215 ple of 2242
limit superior of sets 2215 proof of expected value of the hypergeometric
proof of Chebyshev’s inequality 2216 distribution 2243
proof of Markov’s inequality 2216 proof of variance of the hypergeometric distribu-
60E05 – Distributions: general theory 2217 tion 2243
Cramér-Wold theorem 2217 proof that normal distribution is a distribution
Helly-Bray theorem 2217 2245
Scheffé’s theorem 2218 65-00 – General reference works (hand-
Zipf’s law 2218 books, dictionaries, bibliographies, etc.) 2246
xlviii
normal equations 2246 heapsort 2278
principle components analysis 2247 in-place sorting algorithm 2279
pseudoinverse 2248 insertion sort 2279
65-01 – Instructional exposition (textbooks, lower bound for sorting 2281
tutorial papers, etc.) 2250 quicksort 2282
cubic spline interpolation 2250 sorting problem 2283
65B15 – Euler-Maclaurin formula 2252 68P20 – Information storage and retrieval
Euler-Maclaurin summation formula 2252 2285
proof of Euler-Maclaurin summation formula 2252 Browsing service 2285
65C05 – Monte Carlo methods 2254 Digital Library Index 2285
Monte Carlo methods 2254 Digital Library Scenario 2285
65D32 – Quadrature and cubature formu- Digital Library Space 2286
las 2256 Digitial Library Searching Service 2286
Simpson’s rule 2256 Service, activity, task, or procedure 2286
65F25 – Orthogonalization 2257 StructuredStream 2286
Givens rotation 2257 collection 2286
Gram-Schmidt orthogonalization 2258 digital library stream 2287
Householder transformation 2259 digital object 2287
orthonormal 2261 good hash table primes 2287
65F35 – Matrix norms, conditioning, scal- hashing 2289
ing 2262 metadata format 2293
Hilbert matrix 2262 system state 2293
Pascal matrix 2262 transition event 2294
Toeplitz matrix 2263 68P30 – Coding and information theory
matrix condition number 2264 (compaction, compression, models of com-
matrix norm 2264 munication, encoding schemes, etc.) 2295
pivoting 2265 Huffman coding 2295
65R10 – Integral transforms 2266 Huffman’s algorithm 2297
integral transform 2266 arithmetic encoding 2299
65T50 – Discrete and fast Fourier trans- binary Gray code 2300
forms 2267 entropy encoding 2301
Vandermonde matrix 2267 68Q01 – General 2302
discrete Fourier transform 2268 currying 2302
68M20 – Performance evaluation; queue- higher-order function 2303
ing; scheduling 2270 68Q05 – Models of computation (Turing
Amdahl’s Law 2270 machines, etc.) 2304
efficiency 2270 Cook reduction 2304
proof of Amdahl’s Law 2271 Levin reduction 2305
68P05 – Data structures 2272 Turing computable 2305
heap insertion algorithm 2272 computable number 2305
heap removal algorithm 2273 deterministic finite automaton 2306
68P10 – Searching and sorting 2275 non-deterministic Turing machine 2307
binary search 2275 non-deterministic finite automaton 2307
bubblesort 2276 non-deterministic pushdown automaton 2309
heap 2277 oracle 2310
xlix
self-reducible 2311 Kleene algebra 2331
universal Turing machine 2311 Kleene star 2331
68Q10 – Modes of computation (nondeter- monad 2332
ministic, parallel, interactive, probabilis- 68R05 – Combinatorics 2333
tic, etc.) 2312 switching lemma 2333
deterministic Turing machine 2312 68R10 – Graph theory 2334
random Turing machine 2313 Floyd’s algorithm 2334
68Q15 – Complexity classes (hierarchies, digital library structural metadata specification
relations among complexity classes, etc.) 2334
2315 digital library structure 2335
NP-complete 2315 digital library substructure 2335
complexity class 2315 68T10 – Pattern recognition, speech recog-
constructible 2317 nition 2336
counting complexity class 2317 Hough transform 2336
polynomial hierarchy 2317 68U10 – Image processing 2340
polynomial hierarchy is a hierarchy 2318 aliasing 2340
time complexity 2318 68W01 – General 2341
68Q25 – Analysis of algorithms and prob- Horner’s rule 2341
lem complexity 2320 68W30 – Symbolic computation and alge-
counting problem 2320 braic computation 2343
decision problem 2320 algebraic computation 2343
promise problem 2321 68W40 – Analysis of algorithms 2344
range problem 2321 speedup 2344
search problem 2321 74A05 – Kinematics of deformation 2345
68Q30 – Algorithmic information theory body 2345
(Kolmogorov complexity, etc.) 2323 deformation 2345
Kolmogorov complexity 2323 76D05 – Navier-Stokes equations 2346
Kolmogorov complexity function 2323 Navier-Stokes equations 2346
Kolmogorov complexity upper bounds 2324 81S40 – Path integrals 2347
computationally indistinguishable 2324 Feynman path integral 2347
distribution ensemble 2325 90C05 – Linear programming 2349
hard core 2325 linear programming 2349
invariance theorem 2325 simplex algorithm 2350
natural numbers identified with binary strings 91A05 – 2-person games 2351
2326 examples of normal form games 2351
one-way function 2326 normal form game 2352
pseudorandom 2327 91A10 – Noncooperative games 2353
psuedorandom generator 2327 dominant strategy 2353
support 2327 91A18 – Games in extensive form 2354
68Q45 – Formal languages and automata extensive form game 2354
2328 91A99 – Miscellaneous 2355
automaton 2328 Nash equilibrium 2355
context-free language 2329 Pareto dominant 2355
68Q70 – Algebraic theory of languages and common knowledge 2356
automata 2331 complete information 2356
l
example of Nash equilibrium 2357
game 2357
game theory 2358
strategy 2358
utility 2359
92B05 – General biology and biomathe-
matics 2360
Lotka-Volterra system 2360
93A10 – General systems 2362
transfer function 2362
93B99 – Miscellaneous 2363
passivity 2363
93D99 – Miscellaneous 2365
Hurwitz matrix 2365
94A12 – Signal theory (characterization,
reconstruction, etc.) 2366
rms error 2366
94A17 – Measures of information, entropy
2367
conditional entropy 2367
gaussian maximizes entropy for given covariance
2368
mutual information 2368
proof of gaussian maximizes entropy for given
covariance 2369
94A20 – Sampling theory 2371
sampling theorem 2371
94A60 – Cryptography 2372
Diffie-Hellman key exchange 2372
elliptic curve discrete logarithm problem 2373
94A99 – Miscellaneous 2374
Heaps’ law 2374
History 2375
li
GNU Free Documentation License
Preamble
The purpose of this License is to make a manual, textbook, or other written document “free”
in the sense of freedom: to assure everyone the effective freedom to copy and redistribute
it, with or without modifying it, either commercially or noncommercially. Secondarily, this
License preserves for the author and publisher a way to get credit for their work, while not
being considered responsible for modifications made by others.
This License is a kind of “copyleft”, which means that derivative works of the document must
themselves be free in the same sense. It complements the GNU General Public License, which
is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free
software needs free documentation: a free program should come with manuals providing the
same freedoms that the software does. But this License is not limited to software manuals; it
can be used for any textual work, regardless of subject matter or whether it is published as a
printed book. We recommend this License principally for works whose purpose is instruction
or reference.
This License applies to any manual or other work that contains a notice placed by the copy-
right holder saying it can be distributed under the terms of this License. The “Document”,
lii
below, refers to any such manual or work. Any member of the public is a licensee, and is
addressed as “you”.
A “Modified Version” of the Document means any work containing the Document or a
portion of it, either copied verbatim, or with modifications and/or translated into another
language.
The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being
those of Invariant Sections, in the notice that says that the Document is released under this
License.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or
Back-Cover Texts, in the notice that says that the Document is released under this License.
Examples of suitable formats for Transparent copies include plain ASCII without markup,
Texinfo input format, LATEX input format, SGML or XML using a publicly available DTD,
and standard-conforming simple HTML designed for human modification. Opaque formats
include PostScript, PDF, proprietary formats that can be read and edited only by proprietary
word processors, SGML or XML for which the DTD and/or processing tools are not generally
available, and the machine-generated HTML produced by some word processors for output
purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following pages
as are needed to hold, legibly, the material this License requires to appear in the title page.
For works in formats which do not have any title page as such, “Title Page” means the text
near the most prominent appearance of the work’s title, preceding the beginning of the body
of the text.
liii
Verbatim Copying
You may copy and distribute the Document in any medium, either commercially or non-
commercially, provided that this License, the copyright notices, and the license notice saying
this License applies to the Document are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You may not use technical measures
to obstruct or control the reading or further copying of the copies you make or distribute.
However, you may accept compensation in exchange for copies. If you distribute a large
enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly
display copies.
Copying in Quantity
If you publish printed copies of the Document numbering more than 100, and the Document’s
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly
and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover
Texts on the back cover. Both covers must also clearly and legibly identify you as the
publisher of these copies. The front cover must present the full title with all words of the
title equally prominent and visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve the title of the Document
and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the
first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto
adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you
must either include a machine-readable Transparent copy along with each Opaque copy, or
state in or with each Opaque copy a publicly-accessible computer-network location containing
a complete Transparent copy of the Document, free of added material, which the general
network-using public has access to download anonymously at no charge using public-standard
network protocols. If you use the latter option, you must take reasonably prudent steps, when
you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy
will remain thus accessible at the stated location until at least one year after the last time
you distribute an Opaque copy (directly or through your agents or retailers) of that edition
to the public.
It is requested, but not required, that you contact the authors of the Document well before
redistributing any large number of copies, to give them a chance to provide you with an
updated version of the Document.
liv
Modifications
You may copy and distribute a Modified Version of the Document under the conditions of
sections 2 and 3 above, provided that you release the Modified Version under precisely this
License, with the Modified Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy of it. In addition, you
must do these things in the Modified Version:
• Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a previous
version if the original publisher of that version gives permission.
• List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five of
the principal authors of the Document (all of its principal authors, if it has less than
five).
• State on the Title page the name of the publisher of the Modified Version, as the
publisher.
• Preserve all the copyright notices of the Document.
• Add an appropriate copyright notice for your modifications adjacent to the other copy-
right notices.
• Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.
• Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.
• Include an unaltered copy of this License.
• Preserve the section entitled “History”, and its title, and add to it an item stating
at least the title, year, new authors, and publisher of the Modified Version as given
on the Title Page. If there is no section entitled “History” in the Document, create
one stating the title, year, authors, and publisher of the Document as given on its
Title Page, then add an item describing the Modified Version as stated in the previous
sentence.
• Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in the
Document for previous versions it was based on. These may be placed in the “History”
section. You may omit a network location for a work that was published at least four
years before the Document itself, or if the original publisher of the version it refers to
gives permission.
lv
• In any section entitled “Acknowledgements” or “Dedications”, preserve the section’s
title, and preserve in the section all the substance and tone of each of the contributor
acknowledgements and/or dedications given therein.
• Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.
• Delete any section entitled “Endorsements”. Such a section may not be included in
the Modified Version.
• Do not retitle any existing section as “Endorsements” or to conflict in title with any
Invariant Section.
If the Modified Version includes new front-matter sections or appendices that qualify as
Secondary Sections and contain no material copied from the Document, you may at your
option designate some or all of these sections as invariant. To do this, add their titles to
the list of Invariant Sections in the Modified Version’s license notice. These titles must be
distinct from any other section titles.
You may add a section entitled “Endorsements”, provided it contains nothing but endorse-
ments of your Modified Version by various parties – for example, statements of peer review
or that the text has been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25
words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version.
Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already includes a cover
text for the same cover, previously added by you or by arrangement made by the same entity
you are acting on behalf of, you may not add another; but you may replace the old one, on
explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to
use their names for publicity for or to assert or imply endorsement of any Modified Version.
Combining Documents
You may combine the Document with other documents released under this License, under
the terms defined in section 4 above for modified versions, provided that you include in the
combination all of the Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical In-
variant Sections may be replaced with a single copy. If there are multiple Invariant Sections
lvi
with the same name but different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original author or publisher of that
section if known, or else a unique number. Make the same adjustment to the section titles
in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled “History” in the various original
documents, forming one section entitled “History”; likewise combine any sections entitled
“Acknowledgements”, and any sections entitled “Dedications”. You must delete all sections
entitled “Endorsements.”
Collections of Documents
You may make a collection consisting of the Document and other documents released under
this License, and replace the individual copies of this License in the various documents with
a single copy that is included in the collection, provided that you follow the rules of this
License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually
under this License, provided you insert a copy of this License into the extracted document,
and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent doc-
uments or works, in or on a volume of a storage or distribution medium, does not as a whole
count as a Modified Version of the Document, provided no compilation copyright is claimed
for the compilation. Such a compilation is called an “aggregate”, and this License does not
apply to the other self-contained works thus compiled with the Document, on account of
their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then
if the Document is less than one quarter of the entire aggregate, the Document’s Cover Texts
may be placed on covers that surround only the Document within the aggregate. Otherwise
they must appear on covers around the whole aggregate.
Translation
lvii
requires special permission from their copyright holders, but you may include translations of
some or all Invariant Sections in addition to the original versions of these Invariant Sections.
You may include a translation of this License provided that you also include the original
English version of this License. In case of a disagreement between the translation and the
original English version of this License, the original English version will prevail.
Termination
You may not copy, modify, sublicense, or distribute the Document except as expressly pro-
vided for under this License. Any other attempt to copy, modify, sublicense or distribute the
Document is void, and will automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this License will not have their
licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Doc-
umentation License from time to time. Such new versions will be similar in spirit to
the present version, but may differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document
specifies that a particular numbered version of this License ”or any later version” applies
to it, you have the option of following the terms and conditions either of that specified
version or of any later version that has been published (not as a draft) by the Free Software
Foundation. If the Document does not specify a version number of this License, you may
choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of the License in the
document and put the following copyright and license notices just after the title page:
lviii
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A
copy of the license is included in the section entitled “GNU Free Documentation
License”.
If you have no Invariant Sections, write “with no Invariant Sections” instead of saying which
ones are invariant. If you have no Front-Cover Texts, write “no Front-Cover Texts” instead
of “Front-Cover Texts being LIST”; likewise for Back-Cover Texts.
lix
Chapter 1
UNCLA – Unclassified
A Golomb ruler of length n is a ruler with only a subset of the integer markings {0, a2 , · · · , n} ⊂
{0, 1, 2, . . . , n} that appear on a regular ruler. The defining criterion of this subset is that
there exists an m such that any positive integer k 6 m can be expresses uniquely as a
difference k = ai − aj for some i, j. This is referred to as an m-Golomb ruler.
A 4-Golomb ruler of length n is given by {0, 1, 3, 7}. To verify this, we need to show that
every number 1, 2, . . . , 7 can be expressed as a difference of two numbers in the above set:
1=1−0
2=3−1
3=3−0
4=7−3
An optimal Golomb ruler is one where for a fixed value of n the value of an is minimized.
A Hesse configuration is a set P of nine non-collinear points in the projective plane over
a field K such that any line through two points of P contains exactly three points of P .
1
Then there are 12 such lines through P . A Hesse configuration exists if and only if the field
K contains a primitive third root of unity. For such K the projective automorphism group
PGL(3, K) acts transitively on all possible Hesse configurations.
The configuration P with its intersection structure of 12 lines is isomorphic to the affine space
A = F2 where F is a field with three elements.
The group Γ ⊂ PGL(3, K) of all symmetries that map P onto itself has order 216 and it
is isomorphic to the group of affine transformations of A that have determinant 1. The
stabilizer in Γ of any of the 12 lines through P is a cyclic subgroup of order three and Γ is
generated by these subgroups.
If K is algebraically closed and the characteristic of K is not 2 or 3 then the nine inflection
points of an elliptic curve E over K form a Hesse configuration.
Lagrange’s theorem
1: G group
2: H 6 G
3: [G : H] index of H in G
4: |G| = |H|[G : H]
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
2
1.5 Laurent series
where ck , a, z ∈ C.
One can prove that the above series converges everywhere inside the set
D := {z ∈ C | R1 < |z − a| < R2 }
where
R1 := lim sup |c−k |1/k
k→∞
and
1/k
R2 := 1/ lim sup |ck | .
k→∞
whose domain is the set of points in C on which the series converges. This function is analytic
inside the annulus D, and conversely, every analytic function on an annulus is equal to some
(unqiue) Laurent series.
\ \
m∗ (A) = m∗ (A S) + m∗ (A S 0)
where m∗ (S) is the Lebesgue outer measure of S. If S is measurable, then we define the
Lebesgue measure of S to be m(S) = m∗ (S).
3
1.7 Leray spectral sequence
The Leray spectral sequence is a special case of the Grothendieck spectral sequence regarding
composition of functors.
where a, b, c, d ∈ C and ad − bc 6= 0
It can be shown that the inverse, and composition of two mobius transformations are similarly
defined, and so the Möbius transformations form a group under composition.
The geometric interpretation of the Möbius group is that it is the group of automorphisms
of the Riemann sphere.
Any Möbius map can be composed from the elementary transformations - dilations, trans-
lations and inversions. If we define a line to be a circle passing through ∞ then it can be
shown that a Möbius transformation maps circles to circles, by looking at each elementary
transformation.
If E is an elliptic curve defined over a number field K, then the group of points with
coordinates in K is a finitely generated abelian group.
4
1.10 Plateau’s Problem
The ”Plateau’s Problem” is the problem of finding the surface with minimal area among all
surfaces wich have the same prescribed boundary.
This problem is named after the Belgian physicist Joseph plateau (1801-1883) who experi-
mented with soap films. As a matter of fact if you take a wire (which represents a closed curve
in three-dimensional space) and dip it in a solution of soapy water, you obtain a soapy sur-
face which has the wire as boundary. It turns out that this surface has the minimal area
among all surfaces with the same boundary, so the soap film is a solution to the Plateau’s
Problem.
Jesse Douglas (1897-1965) solved the problem by proving the existence of such minimal
surfaces. The solution to the problem is achieved by finding an harmonic and conformal
parameterization of the surface.
The extension of the problem to higher dimensions (i.e. for k-dimensional surfaces in n-
dimensional space) turns out to be much more difficult to study. Moreover while the solutions
to the original problem are always regular it turns out that the solutions to the extended
problem may have singularities if n ≥ 8. To solve the extended problem the theory of
currents (Federer and Fleming) has been developed.
e−λ λx
fX (x) = x!
, x = {0, 1, 2, ...}
Parameters:
? λ>0
syntax:
X ∼ P oisson(λ)
5
Notes:
1. X is often used to describe the ocurrence of rare events. It’s a very commonly used
distribution in all fields of statistics.
2. E[X] = λ
3. V ar[X] = λ
t
4. MX (t) = eλ(e −1)
Definition (Discrete) Let (Ω, F, µ) be a discrete probability space, and let X be a discrete
random variable on Ω.
6
3. If a choice be broken down into two successive choices, the original H should be the
weighted sum of the individual values of H.
As an example of this last property, consider losing your luggage down a chute which feeds
three carousels, A, B and C. Assume that the baggage handling system is constructed such
that the probability of your luggage ending up on carousel A is 21 , on B is 13 , and on C is
1
6
. These probabilities specify the pi . There are two ways to think about your uncertainty
about where your luggage will end up.
If you’re not as lost as your luggage, then you may be interested in the following. . .
Theorem The only H satisfying the three above assumptions is of the form:
n
X
H = −k pi log pi
i=1
We wish to obtain a generally finite measure as the “bin size” goes to zero. In the discrete
case, the bin size is the (implicit) width of each of the n (finite or infinite) bins/buckets/states
whose probabilities are the pn . As we generalize to the continuous domain, we must make
this width explicit.
7
To do this, start with a continuous function f discretized as shown in the figure:
As the figure indicates, by the mean-value theorem there exists a value xi in each bin such
that
(i+1)∆
f (xi )∆ = inti∆ f (x)dx (1.12.2)
and thus the integral of the function f can be approximated (in the Riemannian sense) by
∞
X
int∞
−∞ f (x)dx = lim f (xi )∆ (1.12.3)
∆→0
i=−∞
where this limit and “bin size goes to zero” are equivalent.
We will denote ∞
X
∆
H := − ∆f (xi ) log ∆f (xi ) (1.12.4)
i=−∞
As ∆ → 0, we have
∞
X
f (xi )∆ → intf (x)dx = 1 and (1.12.7)
i=−∞
∞
X
∆f (xi ) log f (xi ) → intf (x) log f (x)dx (1.12.8)
i=−∞
8
1.13 Shapiro inequality
Let G be a finite group and p be a prime that divides |G|. We can then write |G| = pk m for
some positive integer k so that p does not divide m.
First Sylow theorem states that any group with order pk m has a Sylow p-subgroup.
Specifically, it concerns a substitution that reduces finding the roots of the polynomial
n
Y
p = T n + a1 T n−1 + ... + an = (T − ri ) ∈ k[T ]
i=1
Historically, the transformation was applied to reduce the general quintic equation, to simpler
resolvents. Examples due to Hermite and Klein are respectively: The principal resolvent
K(X) := X 5 + a0 X 2 + a1 X + a3
9
and the Bring-Jerrard form
K(X) := X 5 + a1 X + a2
Tschirnhaus transformations are also used when computing Galois groups to remove repeated
roots in resolvent polynomials. Almost any transformation will work but it is extremely hard
to find an efficient algorithm that can be proved to work.
π 2.4.....2n
int02 sin2n+1 xdx =
3.5.....(2n + 1)
∞
π Y 4n2 2244
= 2
= ...
2 n=1 4n − 1 1335
A collection S of subsets of a set X (that is, a subset of the power set of X) satisfies
the ascending chain condition or ACC if there does not exist an infinite ascending chain
s1 ⊂ s2 ⊂ · · · of subsets from S.
1.18 bounded
Let X be a subset of R. We say that X is bounded when there exists a real number M such
that |x| < M for all x ∈ M. When X is an interval, we speak of an bounded interval.
This can be generalized first to Rn . We say that X ⊂ Rn is bounded if there is a real number
M such that kxk < M for all x ∈ M and k · k is the Euclidean distance between x and y.
When we consider balls, we speak of bounded balls
10
This condition is equivalent to the statement: There is a real number T such that kx−yk < T
for all x, y ∈ X.
A further generalization to any metric space V says that X ⊂ V is bounded when there is a
real number M such that d(x, y) < M for all x, y ∈ X and d represents the metric (distance
function) on V .
Definition [1]
1. Suppose X and Y are normed vector spaces with norms k·kX and k·kY . Further,
suppose T is a linear map T : X → Y . If there is a C ≥ 0 such that
kT xkY ≤ kxkX
2. Let X and Y be as above, and let T : X → Y is a bounded operator. Then the norm
of T is defined as the real number
kT xkY
kT k = sup{ | x ∈ X \ {0}}.
kxkX
In the special case when X is the zero vector space, any linear map T : X → Y is the
zero map since T (0) = 0T (0) = 0. In this case, we define kT k = 0.
TODO:
3. Give alternative expressions for norm of T . (supremum taken over unit ball)
11
REFERENCES
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons,
1978.
2. G. Bachman, L. Narici, Functional analysis, Academic Press, 1966.
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
2: (z1 , z2 ) 6= (0, 0)
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
Let X be a set, (Y, ρ) a metric space and {fn } a sequence of functions from X to Y , and
f : X → Y another function.
12
1.22 descending chain condition
A collection S of subsets of a set X (that is, a subset of the power set of X) satisfies
the descending chain condition or DCC if there does not exist an infinite descending chain
s1 ⊃ s2 ⊃ · · · of subsets from S.
In the simplest case, the result states that every image of a two-colored ”Diamond” figure (like
the figure in Plato’s Meno dialogue) under the action of the symmetric group of degree 4 has
some ordinary or color-interchange symmetry. The theorem generalizes to graphic designs
on 2x2x2, 4x4, and 4x4x4 arrays. It is of interest because it relates classical (Euclidean)
symmetries to underlying group actions that come from finite rather than from classical
geometry. The group actions in the 4x4 case of the theorem throw some light on the R. T.
Curtis ”miracle octad generator” approach to the large Mathieu group.
13
1: V finite-dimensional vector space
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
1: X module over R
2: Y ⊂ X
3: X generated by Y
4: Y finite
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
1.26 fraction
A fraction is a rational number expressed in the form nd or n/d, where n is designated the
numerator and d the denominator. The slash between them is known as a solidus when
the fraction is expressed as n/d.
14
The rules for manipulating fractions are
a ka
=
b kb
a c ad + bc
+ =
b d bd
a c ad − bc
− =
b d bd
a c ac
× =
b d bd
a c ad
| = .
b d bc
1: ({h : X → X hcovering transformation}, ◦)
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
1.28 idempotent
idempotent
1: R ring
2: r ∈ R
3: r 2 = r
15
fact: if r is idempotent, then 1 − r is idempotent
1: R ring
2: r ∈ R
3: r idempotent
4: 1 − r idempotent
1: R ring
2: r ∈ R
3: r idempotent
4: rR is a ring
1: R ring
2: r ∈ R
3: r idempotent
4: ∀s ∈ rR : rs = sr = s
1: R ring
2: r ∈ R
3: r idempotent
4: R ∼
= rR × (1 − r)R
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
16
1.29 isolated
isolated singularity
S
1: f : U ⊂ C → C {∞}
2: z0 ∈ U
3: f analytic on U \ {z0 }
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
isomorphic groups
2: f : X1 → X2 isomorphism
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
17
1.32 joint continuous density function
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The joint
continuous density function of X1 , X2 , ..., Xn , denoted by fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ), is the
function
(x ,x ,...,x )
1 2 n
such that int(−∞,−∞,...,−∞) fX1 ,X2 ,...,Xn (u1, u2 , ..., un )du1 du2 ...dun = FX1 ,X2 ,...,Xn (x1 , x2 , ..., xn )
2. intx1 ,...,xn fX1 ,X2 ,...,Xn (u1 , u2, ..., un )du1 du2 ...dun = 1
As in the single variable case, fX1 ,X2 ,...,Xn does not represent the probability that each of the
random variables takes on each of the values.
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The joint
cumulative distribution function of X1 , X2 , ..., Xn , denoted by FX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ),
is the following function:
1. lim(x1 ,...,xn)→(−∞,...,−∞) FX1 ,X2 ,...,Xn (x1 , ..., xn ) = 0 and lim(x1 ,...,xn)→(∞,...,∞) FX1 ,X2 ,...,Xn (x1 , ..., xn ) =
1
18
3. FX1 ,X2 ,...,Xn (x1 , ..., xn ) is continuous from the right in each variable.
The way to evaluate FX1 ,X2 ,...,Xn (x1 , ..., xn ) is the following:
(if F is continuous) or
P
FX1 ,X2 ,...,Xn (x1 , ..., xn ) = i1 6x1 ,...,in6xn fX1 ,X2 ,...,Xn (i1 , ..., in )
(if F is discrete),
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The
joint discrete density function of X1 , X2 , ..., Xn , denoted by fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ), is
the following function:
As in the single variable case, sometimes it’s expressed as pX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) to mark
the difference between this function and the continuous joint density function.
19
Version: 3 Owner: Riemann Author(s): Riemann
We are said to be using left function notation if we write functions to the left of their
arguments. That is, if α : X → Y is a function and x ∈ X, then αx is the image of x under
α.
lift of a submanifold
1: X, Y topological manifolds
2: Z ⊂ Y submanifold
3: g : Z → Y inclusion
4: g̃ lift of g
5: i(g̃)
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
If x0 ∈ X, we say that f is continuous at x0 if for any ε > 0 there exists δ positive such that
|f (x) − f (x0 )| < ε
20
whenever
|x − x0 | < δ.
Based on apmξ
lipschitz function
1: f : R → C
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
(ln x−µ)2
−
2σ 2
fX (x) = √ 1 e
,x>0
2πσ2 x
Parameters:
? µ∈R
? σ2 > 0
syntax:
X ∼ LogN(µ, σ 2)
21
Notes:
1. X is a random variable such that ln(X) is a normal random variable with mean µ and
variance σ 2 .
2 /2
2. E[X] = eµ+σ
2 +σ 2 2
3. V ar[X] = e2µ (eσ − 1)
Let S be a set with an ordering relation 6, and let T be a subset of S. A lowest upper bound
of T is an upper bound x of T with the property that x 6 y for every upper bound y of T .
Greatest lower bound is defined similarly: a greatest lower bound of T is a lower bound x of
T with the property that x > y for every lower bound y of T .
Given random variables X1 , X2 , ..., Xn and a subset I ⊂ {1, 2, ..., n}, the marginal distri-
bution of the random variables Xi : i ∈ I is the following:
P
f{Xi :i∈I} (x) = {xi :i∈I}/ fX1 ,...,Xn (x1 , ..., xn ) Q or
f{Xi :i∈I} (x) = int{xi :i∈I}
/ fX1 ,...,Xn (u 1 , ..., u n ) / dui ,
{ui :i∈I}
summing if the variables are discrete and integrating if the variables are continuous.
This is, the marginal distribution of a set of random variables X1 , ..., Xn can be obtained by
summing (or integrating) the joint distribution over all values of the other variables.
22
The most common marginal distribution is the individual marginal distribution (ie, the
marginal distribution of ONE random variable).
measure zero
2: A ∈ M
3: µ(A) = 0
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
Given a graph G with weighted edges, a minimum spanning tree is a spanning tree with
minimum weight, where the weight of a spanning tree is the sum of the weights of its edges.
There may be more than one minimum spanning tree for a graph, since it is the weight of
the spanning tree that must be minimum.
For example, here is a graph G of weighted edges and a minimum spanning tree T for that
graph. The edges of T are drawn as solid lines, while edges in G but not in T are drawn as
dotted lines.
23
•
3 7
• 4 •
4 2
8 • 5
5 7
• 3 •
2 6
Prim’s algorithm or Kruskal’s algorithm can compute the minimum spanning tree of a graph.
Given a list of weights, W := {w1 , w2 , . . . , wn }, the minimum weighted path length is the
minimum of the weighted path length of all extended binary trees that have n external nodes
with weights taken from W . There may be multiple possible trees that give this minimum
path length, and quite often finding this tree is more important than determining the path
length.
Example
Let W := {1, 2, 3, 3, 4}. The minimum weighted path length is 29. A tree that gives this
weighted path length is shown below.
Applications
Constructing a tree of minimum weighted path length for a given set of weights has several
applications, particularly dealing with optimization problems. A simple and elegant algo-
rithm for constructing such a tree is Huffman’s algorithm. Such a tree can give the most
optimal algorithm for merging n sorted sequences (optimal merge). It can also provide a
means of compressing data (Huffman coding), as well as lead to optimal searches.
24
1.46 mod 2 intersection number
1: X smooth manifold
2: X compact
3: Y smooth manifold
4: Z ⊂ Y closed submanifold
5: f : X → Y smooth
7: f transversal to Z
8: |f −1 (Z)| (mod#1)
1: X smooth manifold
2: X compact
3: Y smooth manifold
4: Z ⊂ Y closed submanifold
5: f : X → Y smooth
7: g homotopic to f
8: g transversal to Z
9: |g −1(Z)| (mod#1)
25
fact: a homotopic transversal map exists
1: X smooth manifold
2: X compact
3: Y smooth manifold
4: Z ⊂ Y closed submanifold
5: f : X → Y smooth
7: ∃g homotopic to f : g transversal to Z
fact: two homotopic transversal maps have the same mod 2 inter-
section number
1: X smooth manifold
2: X compact
3: Y smooth manifold
4: Z ⊂ Y closed submanifold
5: f1 , f2 : X → Y smooth
6: f1 homotopic to f2
7: I2 (f1 , Z) = I2 (f2 , Z)
2: Y manifold
3: Z ⊂ Y submanifold
5: g : ∂X → Y
26
6: g can be extended to X
7: I2 (g, Z) = 0
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
It can be shown that if the moment generating function of X is defined on an interval around
the origin, then
(k)
E[X k ] = MX (t)|t=0
In other words, the kth-derivative of the moment generating function evaluated at zero is
the kth moment of X.
1.48 monoid
A monoid is a semigroup G which contains an identity element; that is, there exists an
element e ∈ G such that e · a = a · e = a for all a ∈ G.
27
1.50 multidimensional Gaussian integral
It is easy to see that N(0, K) = exp (− 21 xT K−1 x). How can we normalize N(0, K)?
K−1 is real and symmetric (since (K−1 )T = (KT )−1 = K−1 ). For convenience, let A = K−1 .
We can decompose A into A = TΛT−1 , where T is an orthonormal (TT T = I) matrix of
the eigenvectors of A and Λ is a diagonal matrix of the eigenvalues of A. Then
Z Z Z Z
− 21 xT Ax n 1 T −1
··· e d x = · · · e− 2 x TΛT x dn x. (1.50.2)
Z Z Z Z
− 21 xT TΛT−1 x n 1 T TΛTT x
··· e d x= ··· e− 2 x dn x (1.50.3)
Z Z
1 T Λy
= ··· e− 2 y |J|dn y (1.50.4)
(1.50.5)
∂xm
where |J| is the determinant of the Jacobian matrix Jmn = ∂yn
. In this case, J = T and
thus |J| = 1.
Now we’re in business, because Λ is diagonal and thus the integral may be separated into
the product of n independent Gaussians, each of which we can integrate separately using the
well-known formula 21
1 2 2π
inte− 2 at dt = . (1.50.6)
a
28
Z Z n
Y
− 12 yT Λy n 1 2
··· e d y= inte− 2 λk yk dyk (1.50.7)
k=1
Yn 12
2π
= (1.50.8)
k=1
λk
12
(2π)n
= Qn (1.50.9)
k=1 λk
1
(2π)n 2
= (1.50.10)
|Λ|
(1.50.11)
Now, we have |A| = |TΛT−1 | = |T||Λ||T−1| = |T| = |Λ||T|−1 = |Λ|, so this becomes
Z Z 21
− 21 xT Ax n (2π)n
··· e d x= . (1.50.12)
|A|
Z Z 12
− 21 xT K−1 x n (2π)n 1
··· e d x= = ((2π)n |K|) 2 , (1.50.13)
|K−1 |
as promised.
1.51 multiindex
multiindex
29
1.52 near operators
We start our discussion on the Campanato theory of near operators with some preliminary
tools.
Indeed, dF is zero if and only if x0 = x00 (since F is injective); dF is obviously symmetric and
the triangle inequality follows from the triangle inequality of d.
A particular case of the previous statement is when F is onto (and thus a bijection) and
(Y, d) is complete.
Definition 1. Let X be a set and Y be a metric space. Let F, G be two maps from X to
Y . We say that G is a perturbation of F if there exist a constant k > 0 such that for each
x0 , x00 ∈ X one has:
d(G(x0 ), G(x00 )) ≤ kd(F (x0 ), F (x00 ))
Definition 2. In the same hypothesis as in the previous definition, we say that G is a small
perturbation of F if it is a perturbation of constant k < 1.
We can now prove this generalization of the Banach-Caccioppoli fixed point theorem:
Theorem 1. Let X be a set and (Y, d) be a complete metric space. Let F, G be two mappings
from X to Y such that:
1. F is bijective;
30
2. G is a small perturbation of F .
T he hypothesis (1) ensures that the metric space (X, dF ) is complete. If we now consider
the function T : X → X defined by
T (x) = F −1 (G(x))
where k ∈ (0, 1) is the constant of the small perturbation; note that, by the definition of dF
and applying F ◦ F −1 to the first side, the last equation can be rewritten as
in other words, since k < 1, T is a contraction in the complete metric space (X, dF ); therefore
(by the classical Banach-Caccioppoli fixed point theorem) T has a unique fixed point: there
exist u ∈ X such that T (u) = u; by definition of T this is equivalent to G(u) = F (u), and
the proof is hence complete.
remark 2. The hypothesis of the theorem can be generalized as such: let X be a set and
Y a metric space (not necessarily complete); let F, G be two mappings from X to Y such
that F is injective, F (X) is complete and G(X) ⊆ F (X); then there exists u ∈ X such that
G(u) = F (u).
We can use theorem 1 to prove a result that applies to perturbations which are not necessarily
small (i.e. for which the constant k can be greater than one). To prove it, we must assume
some supplemental structure on the metric of Y : in particular, we have to assume that the
metric d is invariant by dilations, that is that d(αy 0, αy 00) = αd(y 0, y 00 ) for each y 0, y 00 ∈ Y .
The most common case of such a metric is when the metric is deduced from a norm (i.e. when
Y is a normed space, and in particular a Banach space). The result follows immediately:
Corollary 1. Let X be a set and (Y, d) be a complete metric space with a metric d invariant
by dilations. Let F, G be two mappings from X to Y such that F is bijective and G is a
perturbation of F , with constant K > 0.
Then, for each M > K there exists a unique uM ∈ X such that G(u) = MF (u)
T he proof is an immediate consequence of theorem 1 given that the map G̃(u) = G(u)/M
is a small perturbation of F (a property which is ensured by the dilation invariance of the
metric d).
31
We also have the following
Corollary 2. Let X be a set and (Y, d) be a complete, compact metric space with a metric
d invariant by dilations. Let F, G be two mappings from X to Y such that F is bijective and
G is a perturbation of F , with constant K > 0.
L et (an ) be a decreasing sequence of real numbers greater than one, converging to one
(an ↓ 1) and let Mn = an K for each n ∈ N. We can apply corollary 1 to each Mn , obtaining
a sequence un of elements of X for which one has
G(un ) = Mn F (un ). (1.52.1)
We can now introduce the concept of near operators and discuss some of their properties.
A historical remark: Campanato initially introduced the concept in Hilbert spaces; subse-
quently, it was remarked that most of the theory could more generally be applied to Banach
spaces; indeed, it was also proven that the basic definition can be generalized to make part
of the theory available in the more general environment of metric vector spaces.
We will here discuss the theory in the case of Banach spaces, with only a couple of exceptions:
to see some of the extra properties that are available in Hilbert spaces and to discuss a
generalization of the Lax-Milgram theorem to metric vector spaces.
Definition 3. Let X be a set and Y a Banach space. Let A, B be two operators from X
to Y . We say that A is near B if and only if there exist two constants α > 0 and k ∈ (0, 1)
such that, for each x0 , x00 ∈ X one has
kB(x0 ) − B(x00 ) − α(A(x0 ) − A(x00 ))k 6 k kB(x0 ) − B(x00 )k
32
In other words, A is near B if B − αA is a small perturbation of B for an appropriate value
of α.
Observe that in general the property is not symmetric: if A is near B, it is not necessarily
true that B is near A; as we will briefly see, this can only be proven if α < 1/2, or in the
case that Y is a Hilbert space, by using an equivalent condition that will be discussed later
on. Yet it is possible to define a topology with some interesting properties on the space of
operators, by using the concept of nearness to form a base.
The core point of the nearness between operators is that it allows us to “transfer” many
important properties from B to A; in other words, if B satisfies certain properties, and A is
near B, then A satisfies the same properties. To prove this, and to enumerate some of these
“nearness-invariant” properties, we will emerge a few important facts.
In what follows, unless differently specified, we will always assume that X is a set, Y is a
Banach space and A, B are two operators from X to Y .
Lemma 1. If A is near B then there exist two positive constants M1 , M2 such that
W e have:
kB(x0 ) − B(x00 )k 6
6 kB(x0 ) − B(x00 ) − α(A(x0 ) − A(x00 ))k + α kA(x0 ) − A(x00 )k 6
6 k kB(x0 ) − B(x00 )k + α kA(x0 ) − A(x00 )k
and hence
α
kB(x0 ) − B(x00 )k 6 kA(x0 ) − A(x00 )k
1−k
which is the first inequality with M1 = α/(1 − k) (which is positive since k < 1).
But also
kA(x0 ) − A(x00 )k 6
1 1
6 kB(x0 ) − B(x00 ) − α(A(x0 ) − A(x00 ))k + kB(x0 ) − B(x00 )k 6
α α
k 1
6 kB(x0 ) − B(x00 )k + kB(x0 ) − B(x00 )k
α α
and hence
1+k
kA(x0 ) − A(x00 )k 6 kB(x0 ) − B(x00 )k
α
which is the second inequality with M2 = (1 + k)/α.
33
The most important corollary of the previous lemma is the following
Corollary 3. If A is near B then two points of X have the same image under A if and only
if the have the same image under B.
We can express the previous concept in the following formal way: for each y in B(X) there
exist z in Y such that A(B −1 (y)) = {z} and conversely. In yet other words: each fiber of A
is a fiber (for a different point) of B, and conversely.
Also observe that TB and TA are continuous. This follows from the fact that for each x ∈ X
one has
TA (B(x)) = A(x), TB (A(x)) = B(x)
and that the lemma ensures that given a sequence (xn ) in X, the sequence (B(xn )) converges
to B(x0 ) if and only if (A(xn )) converges to A(x0 ).
We can now list some invariant properties of operators with respect ot nearness. The prop-
erties are given in the form “if and only if” because each operator is near itself (therefore
ensuring the “only if” part).
4. a map has dense range iff it is near a map with dense range.
AnotherTimportant property that follows from the lemma is that if there exist y ∈ Y such that
A−1 (y) B −1 (y) 6= ∅, then it is A−1 (y) = B −1 (y): intersecting fibers are equal. (Campanato
only stated this property for the case y = 0 and called it “‘the kernel property”; I prefer to
call it the “fiber persistence” property.)
In this section we will show that the concept of nearness between operator can indeed be
connected to a topological understanding of the set of maps from X to Y .
34
Let M be the set of maps between X and Y . For each F ∈ M and for each k ∈ (0, 1) we let
Uk (F ) the set of all maps G ∈ M such that F − G is a small perturbation of F with constant
k. In other words, G ∈ Uk (F ) iff G is near F with constants 1, k.
The set U(F ) = {Uk (F ) | 0 < k < 1} satisfies the axioms of the set of fundamental
neighbourhoods. Indeed:
1. F belongs to each Uk (F );
3. for each Uk (F ) there exist Uh (F ) such that for each G ∈ Uh (F ) there exist Uj (G) ⊆
Uk (F ).
This last property (permanence of neighbourhoods) is somewhat less trivial, so we shall now
prove it.
L et Uk (F ) be given.
We then want h + j(1 + h) ≤ k, that is j ≤ (k − h)/(1 + h); the condition 0 < j < 1 is always
satisfied on the right side, and the left side gives us h < k.
It is important to observe that the topology generated this way is not a Hausdorff topology:
indeed, it is not possible to separate F and F + y (where F ∈ M and y is a constant element
of Y ). On the other hand, the subset of all maps with with a fixed valued at a fixed point
(F (x0 ) = y0 ) is a Hausdorff subspace.
35
Another important characteristic of the topology is that the set H of invertible operators
from X to Y is open in M (because a map is invertible iff it is near an invertible map). This
is not true in the topology of uniform convergence, as is easily seen by choosing X = Y = R
and the sequence with generic element Fn (x) = x3 − x/n: the sequence converges (in the
uniform convergence topology) to F (x) = x3 , which is invertible, but none of the Fn is
invertible. Hence F is an element of H which is not inside H, and H is not open.
[TODO]
r+x−1
fX (x) = x
pr (1 − p)x , x = {0, 1, ...}
Parameters:
? r>0
? p ∈ [0, 1]
syntax:
X ∼ NegBin(r, p)
Notes:
1. If r ∈ N, X represents the number of failed Bernoulli trials before the rth success.
Note that if r = 1 the variable is a geometric random variable.
36
2. E[X] = r 1−p
p
3. V ar[X] = r 1−p
p2
p r
4. MX (t) = ( 1−(1−p)e t)
2
√ 1 − (x−µ)
fX (x) = 2πσ 2
e 2σ 2 ,x∈R
Parameters:
? µ∈R
? σ2 > 0
syntax:
X ∼ N(µ, σ 2 )
Notes:
1. Probably the most frequently used distribution. fX (x) will look like a bell-shaped
function, hence justifying the synonym bell distribution.
2. When µ = 0 and σ 2 = 1 the distribution is called standard normal
3. The cumulative distribution function of X is often called Φ(x).
4. E[X] = µ
5. V ar[X] = σ 2
2
2σ
6. MX (t) = etµ+t 2
37
1.55 normalizer of a subset of a group
1: X group
2: Y ⊂ X subset
3: {x ∈ X xY x−1 = Y }
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
There are two often-used definitions of the nth root. The first discussed deals with real numbers
only; the second deals with complex numbers.
√
The nth root of a non-negative real number x, written as n x, can be defined as the real
number y such that y n = x. This notation is normally, √ but not always, used when n is a
n
natural number. This definition could also be written as n x ⇔ x∀x > 0 ∈ R.
√
Example: 4 81 = 3 because 34 = 3 × 3 × 3 × 3 = 81.
√
Example: 5 x5 + 5x4 + 10x3 + 5x2 + 1 = x + 1 because (x + 1)5 = (x2 + 2x + 1)2 (x + 1) =
x5 + 5x4 + 10x3 + 10x2 + 5x + 1. (See the binomial theorem and Pascal’s Triangle.)
√
In
√ this definition, n
x
√ is undefined when
√ x < 0.and n is even. When n is odd and x < 0,
3 4
n
x < 0. Examples: −1 = −1, but −1 is undefined for this definition.
A more generalized definition: The nth roots of a complex number t = x+yi = (x, yi) = (r, θ)
are all the complex numbers z1 , z2 , . . . , zn ∈ C that satisfy the condition zkn = t. n such
complex numbers always exist.
38
One of the more popular methods of finding these roots is through geometry and trigonome-
try. The complex numbers are treated as a plane using Cartesian coordinates
√ with an x axis
and a yi axis. (Remember, in the context of complex numbers, i ⇔ −1.) These p rectangu-
lar coordinates (x, yi) are then translated to polar coordinates (r, θ), where r = 2n x2 + y 2
(according to the previous definition of nth root), θ = π2 if x = 0, and θ = arctan xy if x 6= 0.
(See the Pythagorean theorem.)
Then the nth roots of t are the vertices of a regular polygon having n sides, centered at
(0, 0i), and having (r, θ) as calculated above as one of its vertices.
√
Example: Consider 3 8. 8 can also be written as 8+0i or in polar as (8, 0). By our method, we
now have an equilateral triangle centered at (0, 0) and having one vertex at (2, 0). Knowing
that a complete circle consists of 2π radians, and knowing that all angles are equal in an
equilateral triangle, we can deduce that the other two vertices lie at polar coordinates (2, 2π
3
)
and (2, 4π3
). Translating back into rectangular coordinates, we have:
√3
8=2
√ √ √
3
8 = 2(cos 2π3
+ i sin 2π
3
) = 2(− 1
2
+ i 2
3
) = −1 + i 3
√ √ √
3
8 = 2(cos 4π 3
+ i sin 4π
3
) = 2(− 1
2
+ i − 3
2
) = −1 − i 3
√ √ √ √
Example: Consider 4 −16. We can rewrite this as 4 −1 × 4 16 = 2 i.
√
We can find 2 i by using a formula for multiplying complex numbers in √ polar coordinates:
2 4
(r1 , θ1 ) × (r
√ 2 , θ2 ) = (r1 r2 , θ1 + θ2 ). So 0 + i = (r , 2θ). Therefore, r = 02 + 12 = 1 and
π π π
θ = 4 . So i = (1, 4 ), and doubling that we get (2, 4 ).
Now we have a square centered at polar coordinates (0, 0) with one corner at (2, π4 ). Adding
π
2
to the angle repeatedly gives us the remainder of the corners: (2, 3π 4
), (2, 5π
4
), (2, 7π
4
).
Translating these to rectangular coordinates works as in the previous example.
√ √ √ √ √ √ √ √ √
So the four solutions to 4 −16 are 2 + i 2, − 2 + i 2, − 2 − i 2, and 2 − i 2.
√
Example: Consider 3 1 + i. As in the√previous examples,
√ our first step is to convert 1 + 1i
into polar coordinates. We get r = 12 + 12 = 2 and θ = arctan 1 = π4 , giving a polar
√ √
coordinate of ( 2, π4 ). Now we take the cube root of this complex number: ( 2, π4 ) = (r 3 , 3θ).
√ π
We get coordinates ( 6 2, 12 ). This point is one vertex of an equilateral triangle centered at
(0, 0). The other two vertices of the triangle are derived from adding 2π 3
to θ. We know this
because lines from the center of an equilateral triangle to each of the corners will form three
equal angles of width 2π 3
about the center, and because all three vertices of an equilateral
triangle will be the same distance from the center.
√ √
So the other vertices in polar coordinates are ( 6 2, 3π4
) and ( 6
2, 17π
12
). Most people would
just use a calculator to compute the sines and cosines of these angles, but they can be
interpolated using these handy identities:
39
√
cos 2t = 1 − 2 sin2 t (use this to calculate sin( 12
π
) from sin( π3 ) = 2
3
)
3π 2π
sin(a + b) = sin(a) cos(b) + cos(a) sin(b) (use a = 4
and b = 3
)
The process of calculating these values is left as an exercise to the reader in the interest of
space. The rectangular coordinates, the cube roots of 1 + i, are:
√ √ √ q
6 √ √
2 4 12
( 2, 12 ) = 2 + i 2 1 − 23
6 π 6
√ √
6 √
6
( 6 2, 3π
4
) = − 128
2
+ i 128
2
√ √ √ √ √ √ √
( 6 2, 17π
12
) = 6
2 2− 6
4
− i 6
2 2+ 6
4
Let (X, ρ) be a metric space and x0 ∈ X. Let r be a positive number. The set
is called the ball with center x0 and radius r. On some spaces like C or R2 this is also known
as an open disk and when the space is R, it is known as open interval (all three spaces with
standard metric).
If R is a ring, then we may construct the opposite ring Rop which has the same underlying
abelian group structure, but with multiplication in the opposite order: the product of r1 and
r2 in Rop is r2 r1 .
40
If M is a left R-module, then it can be made into a right Rop -module, where a module
element m, when multiplied on the right by an element r of Rop , yields the rm that we
have with our left R-module action on M. Similarly, right R-modules can be made into left
Rop -modules.
Given a group action G on a set X, define Gx to be the orbit of x and Gx to be the set of
stabilizers of x. For each x ∈ X the correspondence g(x) → gGx is a bijection between Gx,
and the set of left cosets of Gx
1.61 orthogonal
• orthogonal matrices
• orthogonal polynomials
• orthogonal vectors
In general, two objects are orthogonal if they do not “coincide” in some sense. Sometimes
orthogonal means roughly the same thing as “perpendicular”.
41
1: A set
3: X 6 SA
4: (X, ◦)
1: A set
2: a ∈ A
3: X permutation group on A
4: σ ∈ X
1: A set
2: a ∈ A
3: X permutation group on A
T
4: σ∈X σStabX (a) = 1
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
42
When R = Z the prime elements as formulated above are simply prime numbers.
Let (E1 , B1 (E1 )) and (E2 , B2 (E2 )) be two measurable spaces, with measures µ1 and µ2 . Let
B1 × B2 be the sigma algebra on E1 × E2 generated by subsets of the form B1 × B2 , where
B1 ∈ B1 (E1 ) and B2 ∈ B2 (E2 ).
The product measure µ1 × µ2 is defined to be the unique measure on the measurable space
(E1 × E2 , B1 × B2 ) satisfying the property
projective line
example
1: ` = {[X, Y, Z, W ] ∈ RP3 Z = W = 0}
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
projective plane
1: ∼: S2 × S2 → {0, 1}
2: x ∼ y ⇔ y = −x
43
3: p : S2 → S2 /∼
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
We will find local extremes of the function f (x) where ∇f = 0. This can be proved by
contradiction:
∇f 6= 0
⇔ ∃0 > 0, ∀; 0 < < 0 : f (x − ∇f ) < f (x) < f (x + ∇f )
but then f (x) is not a local extreme.
Any vector x ∈ Rn can have one component perpendicular to the subset Si (for visualization,
think n = 3 and let Si be a flat surface). ∇gi will be perpendicular to Si , because:
∃0 > 0, ∀; 0 < < 0 : gi (x − ∇gi ) < gi (x) < gi (x + ∇gi )
But gi (x) = 0, so any vector x + ∇gi must be outside Si , and also outside S. (todo: I have
proved that there might exist a component perpendicular to each subset Si , but not that
there exists only one; this should be done)
By the argument above, ∇f must be zero - but now we can ignore all components of ∇f
perpendicular to S. (todo: this should be expressed more formally and proved)
∇f = λi ∇gi
We will have local extreme(s) within S where there exists a set λi , i = 1, . . ., m such that
X
∇f = λi ∇gi
44
1.68 proof of orbit-stabilizer theorem
The power rule can be derived by repeated application of the product rule.
The power rule has been shown to hold for n = 0 and n = 1. If the power rule is known to
hold for some k > 0, then we have
D k+1 D
x = (x · xk )
Dx Dx
D
= x( xk ) + xk
Dx
= x · (kxk−1 ) + xk
= kxk + xk
= (k + 1)xk
Dy p/q p
(x ) = xp/q−1 (1.69.1)
Dx q
By definition, we have y q = xp . We now take the derivative with respect to x on both sides
of the equality.
45
D q D p
y = x
Dx Dx
D q Dy
(y ) = pxp−1
Dy Dx
Dy
qy p−1 = pxp−1
Dx
Dy p xp−1
=
Dx q y p−1
p p−1 −q/y
= x y
q
p p−1 p/q−p
= x x
q
p p−1+p/q−p
= x
q
p p/q−1
= x
q
For positive irrationals we claim continuity due to the fact that (1.69.1) holds for all positive
rationals, and there are positive rationals that approach any positive irrational.
Du−n
= −nu−n−1 (1.69.2)
Dx
46
D n −n
(u u ) = 1
Dx
Du−n D un
un + u−n = 0
Dx Dx
D u −n
un + u−n (nun−1 ) = 0
Dx
Du−n
un = −nu−1
Dx
Du−n
= −nu−n−1
Dx
D f (x + h)g(x + h) − f (x)g(x)
[f (x)g(x)] = lim
Dx h→0 h
f (x + h)g(x + h) + f (x + h)g(x) − f (x + h)g(x) − f (x)g(x)
= lim
h→0
h
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) + g(x)
h→0 h h
0 0
= f (x)g (x) + f (x)g(x)
47
1.72 proof of sum rule
Let P be the set of positive primes. P is countably infinite, so there is a bijection between
P and N. Since there is a bijection between C and a subset of N, there must in turn be a
one-to-one function f : C → P .
Each S ∈ C is countable, so there exists a bijection between S and some subset of N. Call
this function g, and define a new function hS : S → N such that for all x ∈ S,
hS (x) = f (S)g(x)
Note that hS is one-to-one. Also note that for any distinct pair S, T ∈ C, the range of hS
and the range of hT are disjoint due to the fundamental theorem of arithmetic.
S S
We may now define a one-to-one function h : C → N, where, for each x ∈ C, h(x) =
hS (x) for some S ∈ C where x ∈ S (the choice of S is irrelevant, so long S as it contains x).
Since the range of h is a subset of N, h is a bijection into that set and hence C is countable.
1.74 quadrature
Some numerical quadrature methods are Simpson’s rule, the trapezoidal rule, and Riemann sums.
48
1.75 quotient module
quotient module
1: X is a ring
2: Y a module over X
3: Z is a submodule of Y
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
While variations abound, fundamentally a regular expression consists of the following components.
Parentheses can be used for grouping and nesting, and must contain a fully-formed regular
expression. The | symbol can be used for denoting alternatives. Some specifications do not
provide nesting or alternatives. There are also a number of postfix operators. The ? operator
means that the preceding element can either be present or non-present, and corresponds to a
rule of the form A → B | λ. The * operator means that the preceding element can be present
zero or more times, and corresponds to a rule of the form A → BA | λ. The + operator
means that the preceding element can be present one or more times, and corresponds to a
rule of the form A → BA | B. Note that while these rules are not immediately in regular
form, they can be transformed so that they are.
Here is an example of a regular expression that specifies a grammar that generates the binary
representation of all multiples of 3 (and only multiples of 3).
49
S ::= AB
A ::= CD
B ::= 0B|λ
C ::= 0C|λ
D ::= 1E1
E ::= F E|λ
F ::= 0G0
G ::= 1G|λ
A little further work is required to transform this grammar into an acceptable form for
regular grammars, but it can be shown that this grammar (and any grammar specified by a
regular expression) is equivalent to some regular grammar.
Regular expressions have many applications. Quite often they are used for powerful string
matching and substitution features in many text editors and programming languages.
A regular grammar is a context-free grammar where all productions must take one of the
following forms (specified here in BNF, λ is the empty string):
A regular language is the set of strings generated by a regular grammar. Regular grammars
are also known as Type-3 grammars in the Chomsky hierarchy.
50
1.78 right function notation
We are said to be using right function notation if we write functions to the right of their
arguments. That is, if α : X → Y is a function and x ∈ X, then xα is the image of x under
α.
When working in a context in which all rings have a multiplicative identity, one also requires
that f (1R ) = 1S .
1.80 scalar
schrodinger operator
51
1: V : R → R
y
2: y 7→ − dd2yx + V (x)y
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
The Problem
The Algorithm
Suppose L = {x1 , x2 , . . . , xn } is the initial list of unsorted elements. The selection sort
algorithm sorts this list in n steps. At each step i, find the largest element L[j] such that
j < n − i + 1, and swap it with the element at L[n − i + 1]. So, for the first step, find the
largest value in the list and swap it with the last element in the list. For the second step,
find the largest value in the list up to (but not including) the last element, and swap it with
the next to last element. This is continued for n − 1 steps. Thus the selection sort algorithm
is a very simple, in-place sorting algorithm.
Pseudocode
52
Analysis
The selection sort algorithm has the same runtime for any set of n elements, no matter
what the values or order of those elements are. Finding the maximum element of a list of
i elements requires i − 1 comparisons. Thus T (n), the number of comparisons required to
sort a list of n elements with the selection sort, can be found:
n
X
T (n) = (i − 1)
i=2
Xn
= i−n−2
i=1
2
(n − n − 4)
=
2
= O(n2 )
However, the number of data movements is the number of swaps required, which is n−1. This
algorithm is very similar to the insertion sort algorithm. It requires fewer data movements,
but requires more comparisons.
1.83 semiring
Addition and (left and right) multiplication are monotonic operators with respect to 6, with
0 as the minimal element.
53
1.84 simple function
for some n ∈ N.
A simple path in a graph is a path that contains no vertex more than once. By definition,
cycles are particular instances of simple paths.
solutions of an equation
1: {x f (x) = 0}
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
54
• •
• • •
• •
• •
For any tree there is exactly one spanning tree: the tree itself.
The square root operation is distributive for multiplication and division, but not for addition
and subtraction.
√ √ √ q √
That is, x × y = x × y, and xy = √xy .
√ √ √ √ √ √
However, in general, x + y 6= x + y and x − y 6= x − y.
p
Example: x2 y 2 = xy because (xy)2 = xy × xy = x × x × y × y = x2 × y 2 = x2 y 2.
q 2
9 2
Example: 25
= 35 because 53 = 352 = 25
9
.
√ 1
The square root notation is actually an alternative to exponentiation. That√is, x ⇔ x 2√ . As
3 3
such, the square root operation is associative with exponentiation. That is, x3 = x 2 = x .
√
Negative real numbers do not have√real square roots. For example, −4 is not a real number.
Proof by contradiction: Suppose −4 = x ∈ R. If x is negative, x2 is positive. But if x is
55
√
positive, x2 is also positive. But x cannot be zero either, because 02 = 0. So −4 6∈ R.
For additional discussion of the square root and negative numbers, see the discussion of
complex Numbers.
A stable sorting algorithm is any sorting algorithm that preserves the relative ordering of
items with equal values. For instance, consider a list of ordered pairs L := {(A, 3), (B, 5), (C, 2), (D, 5), (E, 4
If a stable sorting algorithm sorts L on the second value in each pair using the 6 relation,
then the result is guaranteed to be {(C, 2), (A, 3), (E, 4), (B, 5), (D, 5)}. However, if an
algorithm is not stable, then it is possible that (D, 5) may come before (B, 5) in the sorted
output.
Some examples of stable sorting algorithms are bubblesort and mergesort (although the
stability of mergesort is dependent upon how it is implemented). Some examples of unstable
sorting algorithms are heapsort and quicksort (quicksort could be made stable, but then it
wouldn’t be quick any more). Stability is a useful property when the total ordering relation
is dependent upon initial position. Using a stable sorting algorithm means that sorting by
ascending position for equal keys is built-in, and need not be implemented explicitly in the
comparison operator.
Given a random
p variable X, the standard deviation of X is defined as
SD[X] = V ar[X].
The standard deviation is a measure of the variation of X around the expected value.
The random variables X1 , X2 , ..., Xn are stochastically independent (or just independent)
if
56
fX1 ,...,Xn (x1 , ..., xn ) = fX1 (x1 ) · · · fXn (xn ) ∀(x1 , ..., xn ) ∈ Rn
This is, the random variables X1 , ..., Xn are independent if its joint distribution function can
be expressed as the product of the marginal distributions of the variables, evaluated at the
corresponding points.
1. FX1 ,...,Xn (x1 , ..., xn ) = FX1 (x1 ) · · · fXn (xn )∀(x1 , ..., xn ) ∈ Rn (joint cumulative distribution)
2. MX1 +...+Xn (t) = MX1 (t) · · · MXn (t)∀(t1 , ..., tn ) (moment generating function)
Q Q
3. E[ ni=1 Xi ] = ni=1 E[Xi ] (expectation)
However, only the first two above imply independence. See also correlation.
1.92 substring
1.93 successor
S
Given a set S, the successor of S is the set S {S}. One often denotes the successor of S
by S 0 .
57
1.94 sum rule
D
[f (x) + g(x)] = f 0 (x) + g 0 (x)
Dx
Proof
Examples
D D D
(x + 1) = x+ 1=1
Dx Dx Dx
D 2 D 2 D D
(x − 3x + 2) = x + (−3x) + (2) = 2x − 3
Dx Dx Dx Dx
D D D
(sin x + cos x) = sin x + cos x = cos x − sin x
Dx Dx Dx
1.95 superset
Similar rules that hold for ⊆ also hold for ⊇. If X ⊇ Y and Y ⊇ X, then X = Y . Every set
is a superset of itself, and every set is a superset of the empty set.
58
1.96 symmetric polynomial
Every symmetric polynomial can be written as a polynomial expression in the elementary symmetric polynom
1: f meromorphic in Ω
2: ∀0 6 i 6 n : f (ai ) = 0
3: ∀0 6 i 6 m : f (bi ) = ∞
4: γ cycle
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
torsion-free module
1: R integral domain
3: Xt torsion submodule
4: Xt = 0
59
fact: a finitely generated torsion-free submodule is a free module
2: X torsion-free
3: X free
(to be fiexd)
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
The relation 6 is a total order if it satisfies the above three properties and the following
additional property:
A tree traversal is an algorithm for visiting all the nodes in a rooted tree exactly once.
The constraint is on rooted trees, because the root is taken to be the starting point of the
traversal. A traversal is also defined on a forest in the sense that each tree in the forest can
be iteratively traversed (provided one knows the roots of every tree beforehand). This entry
presents a few common and simple tree traversals.
60
In the description of a tree, the notion of rooted-subtrees was presented. Full understanding
of this notion is necessary to understand the traversals presented here, as each of these
traversals depends heavily upon this notion.
In a traversal, there is the notion of visiting a node. Visiting a node often consists of doing
some computation with that node. The traversals are defined here without any notion of
what is being done to visit a node, and simply indicate where the visit occurs (and most
importantly, in what order).
• •
• • • •
Vertices will be numbered in the order they are visited, and edges will be drawn with arrows
indicating the path of the traversal.
Preorder Traversal
Given a rooted tree, a preorder traversal consists of first visiting the root, and then
executing a preorder traversal on each of the root’s children (if any).
For example
1
f
a g
2 5
c e i
b d h j
3 4 6 7
The term preorder refers to the fact that a node is visited before any of its descendents.
A preorder traversal is defined for any rooted tree. As pseudocode, the preorder traversal is
61
Visit
(x)
PreorderTraversal
(left(x), Visit)
PreorderTraversal
(right(x), Visit)
end
Postorder Traversal
Given a rooted tree, a postorder traversal consists of first executing a postorder traversal
on each of the root’s children (if any), and then visiting the root.
For example
7
f
a g
3 6
c e i
b d h j
1 2 4 5
As with the preorder traversal, the term postorder here refers to the fact that a node is
visited after all of its descendents. A postorder traversal is defined for any rooted tree. As
pseudocode, the postorder traversal is
62
In-order Traversal
For example
4
f
a g
2 6
c e i
b d h j
1 3 5 7
As can be seen, the in-order traversal has the wonderful property of traversing a tree from
left to right (if the tree is visualized as it has been drawn here). The term in-order comes
from the fact that an in-order traversal of a binary search tree visits the data associated with
the nodes in sorted order. As pseudocode, the in-order traversal is
1.101 trie
A trie is a digital tree for storing a set of strings in which there is one node for every prefix
of every string in the set. The name comes from the word retrieval, and thus is pronounced
the same as tree (which leads to much confusion when spoken aloud). The word retrieval is
63
stressed, because a trie has a lookup time that is equivalent to the length of the string being
looked up.
If a trie is to store some set of strings S ⊆ Σ∗ (where Σ is an alphabet), then it takes the
following form. Each edge leading to non-leaf nodes in the trie is labelled by an element
of Σ. Any edge leading to a leaf node is labelled by $ (some symbol not in Σ). For every
string s ∈ S, there is a path from the root of the trie to a leaf, the labels of which when
concatenated form s ++ $ (where ++ is the string concatenation operator). For every path
from the root of the trie to a leaf, the labels of the edges concatenated form some string in
S.
Example
Suppose we wish to store the set of strings S := {alpha, beta, bear, beast, beat}. The trie that
stores S would be
•
a b
• •
l e
• •
p a t
• • •
r t a
h s
• • • • •
$ $ $
a t
• • • • •
$ $
• •
A unit vector is a vector with a length, or vector norm, of one. In Rn , one can obtain such a
vector by dividing a vector by its magnitude |~v|. For example, we have a vector < 1, 2, 3 >.
A unit vector pointing in this direction would be
1 1 1 2 3
< 1, 2, 3 >= √ < 1, 2, 3 >=< √ , √ , √ >
| < 1, 2, 3 > | 14 14 14 14
64
. The magnitude of this vector is 1.
A fixed point is considered unstable if it is neither attracting nor Liapunov stable. A saddle
point is an example of such a fixed point.
1: (x0n ) ⊂ X 0
2: X a Banach space
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
Every nonempty set S of nonnegative integers contains a least element; that is, there is some
integer a in S such that a 6 b for all b belonging to S.
For example, the positive integers are a well-ordered set under the standard order.
65
Chapter 2
2.1 dimension
The word dimension in mathematics has many definitions, but all of them are trying to
quantify our intuition that, for example, a sheet of paper has somehow one less dimension
than a stack of papers.
One common way to define dimension is through some notion of a number of independent
quantities needed to describe an element of an object. For example, it is natural to say
that the sheet of paper is two-dimensional because one needs two real numbers to specify
a position on the sheet, whereas the stack of papers is three-dimension because a position
in a stack is specified by a sheet and a position on the sheet. Following this notion, in
linear algebra the dimension of a vector space is defined as the minimal number of vectors
such that every other vector in the vector space is representable as a sum of these. Similarly,
the word rank denotes various dimension-like invariants that appear throughout the algebra.
However, if we try to generalize this notion to the mathematical objects that do not possess
an algebraic structure, then we run into a difficulty. From the point of view of set theory
there are as many real numbers as pairs of real numbers since there is a bijection from real
numbers to pairs of real numbers. To distinguish a plane from a cube one needs to impose
restrictions on the kind of mapping. Surprisingly, it turns out that the continuity is not
enough as was pointed out by Peano. There are continuous functions that map a square
onto a cube. So, in topology one uses another intuitive notion that in a high-dimensional
space there are more directions than in a low-dimensional. Hence, the (Lebesgue covering)
dimension of a topological space is defined as the smallest number d such that every covering
of the space by open sets can be refined so that no point is contained in more than d + 1 sets.
For example, no matter how one covers a sheet of paper by sufficiently small other sheets
of paper such that no two sheets can overlap each other, but cannot merely touch, one will
66
always find a point that is covered by 2 + 1 = 3 sheets.
Another definition of dimension rests on the idea that higher-dimensional objects are in some
sense larger than the lower-dimensional ones. For example, to cover a cube with a side length
2 one needs at least 23 = 8 cubes with a side length 1, but a square with a side length 2 can
be covered by only 22 = 4 unit squares. Let N() be the minimal number of open balls in
any covering of a bounded set S by balls of radius . The Besicovitch-Hausdorff dimension
of S is defined as − lim→∞ log N(). The Besicovitch-Hausdorff dimension is not always
defined, and when defined it might be non-integral.
A toy theorem is a simplified version of a more general theorem. For instance, by intro-
ducing some simplifying assumptions in a theorem, one obtains a toy theorem.
Usually, a toy theorem is used to illustrate the claim of a theorem. It can also be illustrative
and insightful to study proofs of a toy theorem derived from a non-trivial theorem. Toy
theorems also have a great education value. After presenting a theorem (with, say, a highly
non-trivial proof), one can sometimes give some assurance that the theorem really holds, by
proving a toy version of the theorem.
For instance, a toy theorem of Brouwer fixed point theorem is obtained by restricting the
dimension to one. In this case, the Brouwer fixed point theorem follows almost immediately
from the intermediate value theorem (see this page).
67
Chapter 3
00-XX – General
For example, filling up the interior of a circle by inscribing polygons with more and more
sides.
68
Chapter 4
Conway’s chained arrow notation is a way of writing numbers even larger than those
provided by the up arrow notation. We define m → n → p = m(p+2) n = m ↑ · · · ↑ n and
| {z }
p
m → n = m → n → 1 = mn . Longer chains are evaluated by
m → ··· → n → p → 1 = m → ··· → n → p
m → ··· → n → 1 → q = m → ··· → n
and
m → · · · → n → p + 1 → q + 1 = m → · · · → n → (m → · · · → n → p → q + 1) → q
For example:
3→3→2=
3 → (3 → 2 → 2) → 1 =
3 → (3 → 2 → 2) =
3 → (3 → (3 → 1 → 2) → 1) =
3 → (3 → 3 → 1) =
3
33 =
327 = 7625597484987
69
A much larger example is:
3→2→4→4=
3 → 2 → (3 → 2 → 3 → 4) → 3 =
3 → 2 → (3 → 2 → (3 → 2 → 2 → 4) → 3) → 3 =
3 → 2 → (3 → 2 → (3 → 2 → (3 → 2 → 1 → 4) → 3) → 3) → 3 =
3 → 2 → (3 → 2 → (3 → 2 → (3 → 2) → 3) → 3) → 3 =
3 → 2 → (3 → 2 → (3 → 2 → 9 → 3) → 3) → 3
Clearly this is going to be a very large number. Note that, as large as it is, it is proceeding
towards an eventual final evaluation, as evidenced by the fact that the final number in the
chain is getting smaller.
Clearly, this process can be extended: m ↑↑↑ 0 = 1 and m ↑↑↑ n = m ↑↑ (m ↑↑↑ [n − 1]).
An alternate notation is to write m(i) n for m ↑ · · · ↑ n. (i−2 times because then m(2) n = m·n
| {z }
i−2 times
and m(1) n = m + n.) Then in general we can define m(i) n = m(i−1) (m(i) (n − 1)).
To get a sense of how quickly these numbers grow, 3 ↑↑↑ 2 = 3 ↑↑ 3 is more than seven and
a half trillion, and the numbers continue to grow much more than exponentially.
Arithmetic progression of length n, initial term a1 and common difference d is the sequence
a1 , a1 + d, a1 + 2d, . . . , a1 + (n − 1)d.
70
S= (a1 + 0)
The sum of terms of an arithmetic progression can be computed using Gauss’s trick: +S = (a1 + (n − 1)d
2S = (2a1 + (n − 1)
We just add the sum with itself written backwards, and the sum of each of the columns equals
to (2a1 + (n − 1)d). The sum is then
(2a1 + (n − 1)d)n
S= .
2
4.4 arity
The arity of something is the number of arguments it takes. This is usually applied to
functions: an n-ary function is one that takes n arguments. Unary is a synonym for 1-ary,
and binary for 2-ary.
Let a be a number. Then for all n ∈ N, an is the product of n a’s. For integers (and their
extensions) we have a “multiplicative identity” called “1”, i.e.a · 1 = a for all a. So we can
write
an = an+0 = an · 1.
From the definition of the power of a the usual laws can be derived; so it is plausible to set
a0 = 1, since 0 doesn’t change a sum, like 1 doesn’t change the product.
4.6 lemma
71
Bezout’s lemma, Gauss’ lemma, Fatou’s lemma, etc., so one clearly can’t get too much sim-
ply by reading into a proposition’s name.
According to [1], the plural ’Lemmas’ is commonly used. The correct plural of lemma,
however, is lemmata.
REFERENCES
1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and
Applied Mathematics, 1998. (pp. 16)
4.7 property
Given each element of a set X, a property is either true or false. Formally, a property P : X →
{true, false}. Any property gives rise in a natural way to the set {x : x has the property P }
and the corresponding characteristic function.
The saddle point approximation (SPA), a.k.a. Stationary phase approximation, is a widely
used method in quantum field theory (QFT) and related fields. Suppose we want to evaluate
the following integral in the limit ζ → ∞:
I = lim int∞
−∞ dx e
−ζf (x)
. (4.8.1)
ζ→∞
The saddle point approximation can be applied if the function f (x) satisfies certain condi-
tions. Assume that f (x) has a global minimum f (x0 ) = ymin at x = x0 , which is sufficiently
separated from other local minima and whose value is sufficiently smaller than the value of
those. Consider the Taylor expansion of f (x) about the point x0 :
1 2
f (x) = f (x0 ) + ∂x f (x) (x − x0 ) + ∂x f (x) (x − x0 )2 + O(x3 ). (4.8.2)
x=x0 2 x=x0
Since f (x0 ) is a (global) minimum, it is clear that f 0 (x0 ) = 0. Therefore f (x) may be
approximated to quadratic order as
1
f (x) ≈ f (x0 ) + f 00 (x0 )(x − x0 )2 . (4.8.3)
2
72
The above assumptions on the minima of f (x) ensure that the dominant contribution to
(4.8.1) in the limit ζ → ∞ will come from the region of integration around x0 :
ζ 00 2
I ≈ lim e−ζf (x0 ) int∞
−∞ dx e
− 2 f (x0 )(x−x0 )
(4.8.4)
ζ→∞
1/2
−ζf (x0 ) 2π
≈ lim e .
ζ→∞ ζf 00 (x0 )
In the last step we have performed the Gaußian integral. The next nonvanishing higher order
correction to (4.8.4) stems from the quartic term of the expansion (4.8.2). This correction
may be incorporated into (4.8.4) to yield (after expanding part of the exponential):
−ζf (x0 ) ∞ − ζ2 f 00 (x0 )(x−x0 )2 ζ 4 4
I ≈ lim e int−∞ dx e 1 − (∂x f (x))|x=x0 (x − x0 ) . (4.8.5)
ζ→∞ 4!
...to be continued with applications to physics...
4.9 singleton
4.10 subsequence
The surreal numbers are a generalization of the reals. Each surreal number consists of two
parts (called the left and right), each of which is a set of surreal numbers. For any surreal
number N, these parts can be called NL and NR . (This could be viewed as an ordered pair of
sets, however the surreal numbers were intended to be a basis for mathematics, not something
to be embedded in set theory.) A surreal number is written N = hNL | NR i.
Not every number of this form is a surreal number. The surreal numbers satisfy two addi-
tional properties. First, if x ∈ NR and y ∈ NL then x y. Secondly, they must be well
73
founded. These properties are both satisfied by the following construction of the surreal
numbers and the 6 relation by mutual induction:
Given two (possibly empty) sets of surreal numbers R and L such that for any x ∈ R and
y ∈ L, x y, hL | Ri.
This process can be continued transfinitely, to define infinite and infinitesimal numbers. For
instance if Z is the set of integers then ω = hZ |i. Note that this does not make equality the
same as identity: h1 | 1i = h|i, for instance.
[ [
N + M = h{N + x | x ∈ ML } {M + x | y ∈ NL } | {N + x | x ∈ MR } {M + x | y ∈ NR }i
Then
N · M =hM · NL + N · ML − NL · ML , M · NR + N · MR − NR · MR |
M · NL + N · MR − NL · MR , M · NR + N · ML − NR · ML i
The surreal numbers satisfy the axioms for a field under addition and multiplication (whether
they really are a field is complicated by the fact that they are too large to be a set).
The integers of surreal mathematics are called the omnific integers. In general positive
integers n can always be written hn − 1 |i and so −n = h| 1 − ni = h| (−n) + 1i. So for
instance 1 = h0 |i.
In general, ha | bi is the simplest number between a and b. This can be easily used to define
the dyadic fractions: for any integer a, a + 21 = ha | a + 1i. Then 21 = h0 | 1i, 41 = h0 | 21 i,
and so on. This can then be used to locate non-dyadic fractions by pinning them between
a left part which gets infinitely close from below and a right part which gets infinitely close
from above.
74
Ordinal arithmetic can be defined starting with ω as defined above and adding numbers
such as hω |i = ω + 1 and so on. Similarly, a starting infinitesimal can be found as h0 |
1, 12 , 41 . . .i = ω1 , and again more can be developed from there.
75
Chapter 5
76
Now on the left we have the arithmetic mean and on the right the harmonic mean, so this
inequality is true.
77
Chapter 6
Let us consider the expression x2 + xy, where x and y are real (or complex) numbers. Using
the formula
(x + y)2 = x2 + 2xy + y 2
we can write
x2 + xy = x2 + xy + 0
y2 y2
= x2 + xy + −
4 4
y 2 y2
= (x + ) − .
2 4
This manipulation is called completing the square [3] in x2 + xy, or completing the square
x2 .
y y2
x2 − xy = (x − )2 − .
2 4
• Completing the square can also be used to find the extremal value of a quadratic
polynomial [2] without calculus. Let us illustrate this for the polynomial p(x) =
78
4x2 + 8x + 9. Completing the square yields
since (2x + 2)2 ≥ 0. Here, equality holds if and only if x = −1. Thus p(x) ≥ 5 for all
x ∈ R, and p(x) = 5 if and only if x = −1. It follows that p(x) has a global minimum
at x = −1, where p(−1) = 5.
• Completing the square can also be used as an integration technique to integrate, say
1
4x2 +8x+9
[3].
REFERENCES
1. R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd, 3rd ed.
2. Matematik Lexikon (in Swedish), J. Thompson, T. Martinsson, Wahlström & Widstrand,
1991.
79
Chapter 7
7.1 QED
The term “QED” is actually an abbreviation and stands for the Latin quod erat demon-
strandum, meaning “which was to be demonstrated.”
QED typically is used to signify the end of a mathematical proof. The symbol
is often used in place of “QED,” and is called the “Halmos symbol” after mathematician
Paul Halmos (it can vary in width, however, and sometimes it is fully or partially shaded).
Halmos borrowed this symbol from magazines, where it was used to denote “end of article.”
7.2 TFAE
The abbreviation “TFAE” is shorthand for “the following are equivalent”. It is used before
a set of equivalent conditions (each implies all the others).
In a definition, when one of the conditions is somehow “better” (simpler, shorter, ...), it
makes sense to phrase the definition with that condition, and mention that the others are
equivalent. “TFAE” is typically used when none of the conditions can take priority over the
others. Actually proving the claimed equivalence must, of course, be done separately.
80
7.3 WLOG
“WLOG” (or “WOLOG”) is an acronym which stands for “without loss of generality.”
For example, we might be discussing properties of a segment (open or closed) of the real number
line. Due to the nature of the reals, we can select endpoints a and b without loss of gen-
erality. Nothing about our discussion of this segment depends on the choice of a or b. Of
course, any segment does actually have specific endpoints, so it may help to actually select
some (say 0 and 1) for clarity.
WLOG can also be invoked to shorten proofs where there are a number of choices of config-
uration, but the proof is “the same” for each of them. We need only walk through the proof
for one of these configurations, and “WLOG” serves as a note that we haven’t lost anything
in the choosing.
The order of operations is a convention that tells us how to evaluate mathematical expres-
sions (these could be purely numerical). The problem arises because expressions consist of
operators applied to variables or values (or other expressions) that each demand individual
evaluation, yet the order in which these individual evaluations are done leads to different
outcomes.
A conventional order of operations solves this. One could technically do without memorizing
this convention, but the only alternative is to use parentheses to group every single term of
an expression and evaluate the innermost operations first.
1. Add b and c,
2. Multiply the sum from (1) with a.
or
1. Multiply a and b,
81
2. Add to the product in (1) the value of c.
One can see the different outcomes for the two cases by selecting some different values for a,
b, and c. The issue is resolved by convention in order of operations: the correct evaluation
would be the second one.
The nearly universal mathematical convention dictates the following order of operations (in
order of which operators should be evaluated first):
1. factorial.
2. Exponentiation.
3. Multiplication.
4. Division.
5. Addition.
Any parenthesized expressions are automatically higher “priority” than anything on the
above list.
There is also the problem of what order to evaluate repeated operators of the same type, as
in:
a/b/c/d
The solution in this problem is typically to assume the left-to-right interpretation. For the
above, this would lead to the following evaluation:
(((a/b)/c)/d)
In other words,
1. Evaluate a/b.
2. Evaluate (1)/c.
3. Evaluate (2)/d.
Note that this isn’t a problem for associative operators such as multiplication or addition in
the reals. One must still proceed with caution, however, as associativity is a notion bound
82
up with the concept of groups rather than just operators. Hence, context is extremely
important.
For more obscure operations than the ones listed above, parentheses should be used to remove
ambiguity. Completely new operations are typically assumed to have the highest priority,
but the definition of the operation should be accompanied by some sort of explanation of how
it is evaluated in relation to itself. For example, Conway’s chained arrow notation explicitly
defines what order repeated applications of itself should be evaluated in (it is right-to-left
rather than left-to-right)!
83
Chapter 8
Roman numerals are a method of writing numbers employed primarily by the ancient
Romans. It place of digits, the Romans used letters to represent the numbers central to the
system:
I 1
V 5
X 10
L 50
C 100
D 500
M 1000
Larger numbers can be made by writing a bar over the letter, which means one thousand
times as much. For instance V is 5000.
Other numbers were written by putting letters together. For instance II means 2. Larger
letters go on the left, so LII is 52, but IIL is not a valid Roman numeral.
One additional rule allows a letter to the left of a larger letter to signify subtracting the
smaller from the larger. For instance IV is 4. This can only be done once; 3 is written III,
not IIV . Also, it is generally required that the smaller letter be the one immediately smaller
than the larger, so 1999 is usually written MCMXCIX, not MIM.
It is worth noting that today it is usually considered incorrect to repeat a letter four times,
so IV is prefered to IIII. However many older monuments do not use the subtraction rule
at all, so 44 was written XXXXIIII instead of the now preferable XLIX.
84
Chapter 9
Jules Henri Poincaré was born on April 29th 1854 in Cité Ducale[BA] a neighborhood in
Nancy, a city in France. He was the son of Dr Léon Poincaré (1828-1892) who was a
professor at the University of Nancy in the faculty of medicine.[14] His mother, Eugénie
Launois (1830-1897) was described as a “gifted mother”[6] who gave special instruction to
her son. She was 24 and his father 26 years of age when Henri was born[9]. Two years after
the birth of Henri they gave birth to his sister Aline.[6]
In 1862 Henri entered the Lycée of Nancy which is today, called in his honor, the Lycée
Henri Poincaré. In fact the University of Nancy is also named in his honor. He graduated
from the Lycée in 1871 with a bachelors degree in letters and sciences. Henri was the top
of class in almost all subjects, he did not have much success in music and was described as
“average at best” in any physical activities.[9] This could be blamed on his poor eyesight
and absentmindedness.[4] Later in 1873, Poincaré entered l’Ecole Polytechnique where he
performed better in mathematics than all the other students. He published his first pa-
per at 20 years of age, titled Démonstration nouvelle des propriétés de l’indicatrice
d’une surface.[3] He graduated from the institution in 1876. The same year he decided
to attend l’Ecole des Mines and graduated in 1879 with a degree in mining engineering.[14]
After his graduation he was appointed as an ordinary engineer in charge of the mining
services in Vesoul. At the same time he was preparing for his doctorate in sciences (not
surprisingly), in mathematics under the supervision of Charles Hermite. Some of Charles
Hermite’s most famous contributions to mathematics are: Hermite’s polynomials, Hermite’s
differential equation, Hermite’s formula of interpolation and Hermitian matrices.[9] Poincaré,
as expected graduated from the University of Paris in 1879, with a thesis relating to dif-
85
ferential equations. He then became a teacher at the University of Caen, where he taught
analysis. He remained there until 1881. He then was appointed as the “maı̂tre de conférences
d’analyse”[14] (professor in charge of analysis conferences) at the University of Paris. Also in
that same year he married Miss Poulain d’Andecy. Together they had four children: Jeanne
born in 1887, Yvonne born in 1889, Henriette born in 1891, and finally Léon born in 1893.
He had now returned to work at the Ministry of Public Services as an engineer. He was
responsible for the development of the northern railway. He held that position from 1881 to
1885. This was the last job he held in administration for the government of France. In 1893
he was awarded the title of head engineer in charge of the mines. After that his career awards
and position continuously escalated in greatness and quantity. He died two years before the
war on July 17th 1912 of an embolism at the age of 58. Interestingly, at the beginning of
World War I, his cousin Raymond Poincaré was the president of the French Republic.
Poincaré’s work habits have been compared to a bee flying from flower to flower. Poincaré
was interested in the way his mind worked, he studied his habits. He gave a talk about his
observations in 1908 at the Institute of General Psychology in Paris. He linked his way of
thinking to how he made several discoveries. His mental organization was not only interesting
to him but also to Toulouse, a psychologist of the Psychology Laboratory of the School of
Higher Studies in Paris. Toulouse wrote a book called Henri Poincaré which was published
in 1910. He discussed Poincaré’s regular schedule: he worked during the same times each
day in short periods of time. He never spent a long time on a problem since he believed
that the subconscious would continue working on the problem while he worked on another
problem. Toulouse also noted that Poincaré also had an exceptional memory. In addition he
stated that most mathematicians worked from principle already established while Poincaré
was the type that started from basic principle each time.[9] His method of thinking is well
summarized as:
Habitué à négliger les détails et à ne regarder que les cimes, il passait de l’une à
l’autre avec une promptitude surprenante et les faits qu’il découvrait se groupant
d’eux-mêmes autour de leur centre étaient instantanémant et automatiquement
classé dans sa mémoire. (He neglected details and jumped from idea to idea, the
facts gathered from each idea would then come together and solve the problem)
[BA]
The mathematician Darboux claimed he was “un intuitif”(intuitive)[BA], arguing that this
is demonstrated by the fact that he worked so often by visual representation. He did not
care about being rigorous and disliked logic. He believed that logic was not a way to invent
but a way to structure ideas but that logic limits ideas.
Poincaré had the opposite philosophical views of Bertrand Rusell and Gottlob Fredge who
believed that mathematics were a branch of logic. Poincaré strongly disagreed, claiming that
intuition was the life of mathematics. Poincaré gives an interesting point of view in his book
Science and Hypothesis:
For a superficial observer, scientific truth is beyond the possibility of doubt; the
86
logic of science is infallible, and if the scientists are sometimes mistaken, this is
only from their mistaking its rule. [12]
Poincaré believed that arithmetic is a synthetic science. He argued that Peano’s axioms
cannot be proven non-circularly with the principle of induction.[7] Therefore concluding
that arithmetic is a priori synthetic and not analytic. Poincaré then went on to say that
mathematics can not be a deduced from logic since it is not analytic. It is important to
note that even today Poincaré has not been proven wrong in his argumentation. His views
were the same as those of Kant[8]. However Poincaré did not share Kantian views in all
branches of philosophy and mathematics. For example in geometry Poincaré believed that
the structure of non-Euclidean space can be known analytically. He wrote 3 books that made
his philosophies known: Science and Hypothesis, The Value of Science and Science
and Method.
Poincaré’s first area of interest in mathematics was the fuchsian function that he named after
the mathematician Lazarus Fuch because Fuch was known for being a good teacher and done
alot of research in differential equations and in the theory of functions. The functions did
not keep the name fuchsian and are today called automorphic. Poincaré actually developed
the concept of those functions as part of his doctoral thesis.[9] An automorphic function is a
function f (z) where z ∈ C which is analytic under its domain and which is invariant under
a denumerable infinite group of linear fractional transformations, they are the generaliza-
tions of trigonometric functions and elliptic functions.[15] Below Poincaré explains how he
discovered Fuchsian functions:
For fifteen days I strove to prove that there could not be any functions like those
I have since called Fuchsian functions. I was then very ignorant; every day I
seated myself at my work table, stayed an hour or two, tried a great number
of combinations and reached no results. One evening, contrary to my custom, I
drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide
until pairs interlocked, so to speak, making a stable combination. By the next
morning I had established the existence of a class of Fuchsian functions, those
which come from the hypergeometric series; I had only to write out the results,
which took but a few hours. [11]
This is a clear indication Henri Poincaré brilliance. Poincaré communicated a lot with Klein
another mathematician working on fuchsian functions. They were able to discuss and further
the theory of automorphic(fushian) functions. Apparently Klein became jealous of Poincaré’s
high opinion of Fuch’s work and ended their relationship on bad terms.
Poincaré contributed to the field of algebraic topology and published Analysis situs in 1895
which was the first real systematic look at topology. He acquired most of his knowledge
from his work on differential equations. He also formulated the Poincaré conjecture, one
of the great unsolved mathematics problems. It is currently one of the “Millennium Prize
Problems”. The problem is stated as:
87
Consider a compact 3-dimensional manifold V without boundary. Is it possible
that the fundamental group V could be trivial, even though V is not homeomorphic
to the 3-dimensional sphere? [5]
The problem has been attacked by many mathematicians such as Henry Whitehead in 1934,
but without success. Later in the 50’s and 60’s progress was made and it was discovered
that for higher dimension manifolds the problem was easier. (Theorems have been stated for
those higher dimensions by Stephe Smale, John Stallings, Andrew Wallace, and many more)
[5] Poincaré also studied homotopy theory, which is the study of topology reduced to various
groups that are algebraically invariant.[9] He introduced the fundamental group in a paper
in 1894, and later stated his infamous conjecture. He also did work in analytic functions,
algebraic geometry, and Diophantine problems where he made important contributions not
unlike most of the areas he studied in.
In 1887, Oscar II, King of Sweden and Norway held a competition to celebrate his sixtieth
birthday and to promote higher learning.[1] The King wanted a contest that would be of
interest so he decided to hold a mathematics competition. Poincaré entered the competition
submitting a memoir on the three body problem which he describes as:
Poincaré did in fact win the competition. In his memoir he described new mathematical ideas
such as homoclinic points. The memoir was about to be published in Acta Mathematica
when an error was found by the editor. This error in fact led to the discovery of chaos
theory. The memoir was published later in 1890.[9] In addition Poincaré proved that the
determinism and predictability were disjoint problems. He also found that the solution of
the three body problem would change drastically with small change on the initial conditions.
This area of research was neglected until 1963 when Edward Lorenz discovered the famous
a chaotic deterministic system using a simple model of the atmosphere.[7]
He made many contributions to different fields of applied mathematics as well such as: celes-
tial mechanics, fluid mechanics, optics, electricity, telegraphy, capillarity, elasticity, thermo-
dynamics, potential theory, quantum theory, theory of relativity and cosmology. In the field
of differential equations Poincaré has given many results that are critical for the qualitative
theory of differential equations, for example the Poincaré sphere and the Poincaré map.
It is that intuition that led him to discover and study so many areas of science. Poincaré
is considered to be the next universalist after Gauss. After Gauss’s death in 1855 people
88
generally believed that there would be no one else that could master all branches of math-
ematics. However they were wrong because Poincaré took all areas of mathematics as “his
province”[4].
REFERENCES
1. The 1911 Edition Encyclopedia: Oscar II of Sweden and Norway, [online],
http://63.1911encyclopedia.org/O/OS/OSCAR II OF SWEDEN AND NORWAY.htm
2. Belliver, André: Henri Poincaré ou la vocation souveraine, Gallimard, 1956.
3. Bour P-E., Rebuschi M.: Serveur W3 des Archives H. Poincaré [online] http://www.univ-
nancy2.fr/ACERHP/
4. Boyer B. Carl: A History of Mathematics: Henri Poincaré, John Wiley & Sons, inc., Toronto,
1968.
5. Clay Mathematics Institute: Millennium Prize Problems, 2000, [online]
http://www.claymath.org/prizeproblems/ .
6. Encyclopaedia Britannica: Biography of Jules Henri Poincaré.
7. Murz, Mauro: Jules Henri Poincaré [Internet Encyclopedia of Philosophy], [online]
http://www.utm.edu/research/iep/p/poincare.htm, 2001.
8. Kolak, Daniel: Lovers of Wisdom (second edition), Wadsworth, Belmont, 2001.
9. O’Connor, J. John & Robertson, F. Edmund: The MacTutor History of Mathematics Archive,
[online] http://www-gap.dcs.st-and.ac.uk/ history/, 2002.
10. Oeuvres de Henri Poincaré: Tome XI, Gauthier-Villard, Paris, 1956.
11. Poincaré, Henri: Science and Method; The Foundations of Science, The Science Press, Lan-
caster, 1946.
12. Poincaré, Henri: Science and Hypothesis; The Foundations of Science, The Science Press,
Lancaster, 1946.
13. Poincaré, Henri: Les méthodes nouvelles de la mécanique celeste, Dover Publications, Inc.
New York, 1957.
14. Sageret, Jules: Henri Poincaré, Mercvre de France, Paris, 1911.
15. Weisstein, W. Eric: World of Mathematics: Automorphic Function, CRC Press LLC, 2002.
89
Chapter 10
by Émilie Richer
The Problem
The Beginnings
After graduation from the école Normale Supérieure de Paris a group of about ten young
90
mathematicians had maintained very close ties.[WA] They had all begun their careers and
were scattered across France teaching in universities. Among them were Henri Cartan and
André Weil who were both in charge of teaching a course on differential and integral calcu-
lus at the University of Strasbourg. The standard textbook for this class at the time was
“Traité d’Analyse” by E. Goursat which the young professors found to be inadequate in
many ways.[BA] According to Weil, his friend Cartan was constantly asking him questions
about the best way to present a given topic to his class, so much so that Weil eventually
nicknamed him “the grand inquisitor”.[WA] After months of persistent questioning, in the
winter of 1934, Weil finally got the idea to gather friends (and former classmates) to settle
their problem by rewriting the treatise for their course. It is at this moment that Bourbaki
was conceived.
The suggestion of writing this treatise spread and very soon a loose circle of friends, includ-
ing Henri Cartan, André Weil, Jean Delsarte, Jean Dieudonné and Claude Chevalley began
meeting regularly at the Capoulade, a café in the Latin quarter of Paris to plan it . They
called themselves the “Committee on the Analysis Treatise”[BL]. According to Chevalley
the project was extremely naive. The idea was to simply write another textbook to replace
Goursat’s.[GD] After many discussions over what to include in their treatise they finally
came to the conclusion that they needed to start from scratch and present all of essential
mathematics from beginning to end. With the idea that “the work had to be primarily a
tool, not usable in some small part of mathematics but in the greatest possible number of
places”.[DJ] Gradually the young men realized that their meetings were not sufficient, and
they decided they would dedicate a few weeks in the summer to their new project. The
collaborators on this project were not aware of it’s enormity, but were soon to find out.
In July of 1935 the young men gathered for their first congress (as they would later call them)
in Besse-en-Chandesse. The men believed that they would be able to draft the essentials of
mathematics in about three years. They did not set out wanting to write something new,
but to perfect everything already known. Little did they know that their first chapter would
not be completed until 4 years later. It was at one of their first meetings that the young
men chose their name : Nicolas Bourbaki. The organization and it’s membership would go
on to become one of the greatest enigmas of 20th century mathematics.
The first Bourbaki congress, July 1935. From left to right, back row: Henri Cartan, René
de Possel, Jean Dieudonné, André Weil, university lab technician, seated: Mirlès, Claude
Chevalley, Szolem Mandelbrojt.
André Weil recounts many years later how they decided on this name. He and a few other
Bourbaki collaborators had been attending the école Normale in Paris, when a notification
was sent out to all first year science students : a guest speaker would be giving a lecture and
attendance was highly recommended. As the Story goes, the young students gathered to
hear, (unbeknownst to them) an older student, Raoul Husson who had disguised himself with
a fake beard and an unrecognizable accent. He gave what is said to be an incomprehensible,
91
nonsensical lecture, with the young students trying desperately to follow him. All his results
were wrong in a non-trivial way and he ended with his most extravagant : Bourbaki’s
Theorem. One student even claimed to have followed the lecture from beginning to end.
Raoul had taken the name for his theorem from a general in the Franco-Prussian war. The
committee was so amused by the story that they unanimously chose Bourbaki as their name.
Weil’s wife was present at the discussion about choosing a name and she became Bourbaki’s
godmother baptizing him Nicolas.[WA] Thus was born Nicolas Bourbaki.
André Weil, Claude Chevalley, Jean Dieudonné, Henri Cartan and Jean Delsarte were among
the few present at these first meetings, they were all active members of Bourbaki until their
retirements. Today they are considered by most to be the founding fathers of the Bourbaki
group. According to a later member they were “those who shaped Bourbaki and gave it
much of their time and thought until they retired” he also claims that some other early
contributors were Szolem Mandelbrojt and René de Possel.[BA]
Bourbaki members all believed that they had to completely rethink mathematics. They felt
that older mathematicians were holding on to old practices and ignoring the new. That is
why very early on Bourbaki established one it’s first and only rules : obligatory retirement
at age 50. As explained by Dieudonné “if the mathematics set forth by Bourbaki no longer
correspond to the trends of the period, the work is useless and has to be redone, this is why
we decided that all Bourbaki collaborators would retire at age 50.”[DJ] Bourbaki wanted to
create a work that would be an essential tool for all mathematicians. Their aim was to create
something logically ordered, starting with a strong foundation and building continuously on
it. The foundation that they chose was set theory which would be the first book in a series
of 6 that they named “éléments de mathématique”(with the ’s’ dropped from mathématique
to represent their underlying belief in the unity of mathematics). Bourbaki felt that the old
mathematical divisions were no longer valid comparing them to ancient zoological divisions.
The ancient zoologist would classify animals based on some basic superficial similarities such
as “all these animals live in the ocean”. Eventually they realized that more complexity
was required to classify these animals. Past mathematicians had apparently made similar
mistakes : “the order in which we (Bourbaki) arranged our subjects was decided according to
a logical and rational scheme. If that does not agree with what was done previously, well, it
means that what was done previously has to be thrown overboard.”[DJ] After many heated
discussions, Bourbaki eventually settled on the topics for “éléments de mathématique” they
would be, in order:
I Set theory
II algebra
III topology
IV functions of one real variable
V topological vector spaces
VI Integration
92
They now felt that they had eliminated all secondary mathematics, that according to them
“did not lead to anything of proved importance.”[DJ] The following table summarizes Bour-
baki’s choices.
What remains after cutting the loose threads What is excluded(the loose threads)
Linear and multilinear algebra theory of ordinals and cardinals
A little general topology the least possible Lattices
Topological vector Spaces Most general topology
Homological algebra Most of group theory finite groups
commutative algebra Most of number theory
Non-commutative algebra Trigonometrical series
Lie groups Interpolation
Integration Series of polynomials
differentiable manifolds Applied mathematics
Riemannian geometry
It didn’t take long for Bourbaki to become aware of the size of their project. They were
now meeting three times a year (twice for one week and once for two weeks) for Bourbaki
“congresses” to work on their books. Their main rule was unanimity on every point. Any
member had the right to veto anything he felt was inadequate or imperfect. Once Bourbaki
had agreed on a topic for a chapter the job of writing up the first draft was given to any
member who wanted it. He would write his version and when it was complete it would be
presented at the next Bourbaki congress. It would be read aloud line by line. According
to Dieudonné “each proof was examined point by point and criticized pitilessly. He goes
on ”one has to see a Bourbaki congress to realize the virulence of this criticism and how it
surpasses by far any outside attack.”[DJ] Weil recalls a first draft written by Cartan (who has
unable to attend the congress where it would being presented). Bourbaki sent him a telegram
summarizing the congress, it read : “union intersection partie produit tu es démembré foutu
Bourbaki” (union intersection subset product you are dismembered screwed Bourbaki).[WA]
During a congress any member was allowed to interrupt to criticize, comment or ask questions
at any time. Apparently Bourbaki believed it could get better results from confrontation
than from orderly discussion.[BA] Armand Borel, summarized his first congress as “two or
three monologues shouted at top voice, seemingly independant of one another”.[BA]
After a first draft had been completely reduced to pieces it was the job of a new collaborator
to write up a second draft. This second collaborator would use all the suggestions and
changes that the group had put forward during the congress. Any member had to be able to
93
take on this task because one of Bourbaki’s mottoes was “the control of the specialists by the
non-specialists”[BA] i.e. a member had to be able to write a chapter in a field that was not
his specialty. This second writer would set out on his assignment knowing that by the time
he was ready to present his draft the views of the congress would have changed and his draft
would also be torn apart despite it’s adherence to the congresses earlier suggestions. The
same chapter might appear up to ten times before it would finally be unanimously approved
for publishing. There was an average of 8 to 12 years from the time a chapter was approved
to the time it appeared on a bookshelf.[DJ] Bourbaki proceeded this way for over twenty
years, (surprisingly) publishing a great number of volumes.
During these years, most Bourbaki members held permanent positions at universities across
France. There, they could recruit for Bourbaki, students showing great promise in math-
ematics. Members would never be replaced formally nor was there ever a fixed number of
members. However when it felt the need, Bourbaki would invite a student or colleague to a
congress as a “cobaille” (guinea pig). To be accepted, not only would the guinea pig have
to understand everything, but he would have to actively participate. He also had to show
broad interests and an ability to adapt to the Bourbaki style. If he was silent he would
not be invited again.(A challenging task considering he would be in the presence of some of
the strongest mathematical minds of the time) Bourbaki described the reaction of certain
guinea pigs invited to a congress : “they would come out with the impression that it was a
gathering of madmen. They could not imagine how these people, shouting -sometimes three
or four at a time- about mathematics, could ever come up with something intelligent.”[DJ]
If a new recruit was showing promise, he would continue to be invited and would gradually
become a member of Bourbaki without any formal announcement. Although impossible to
have complete anonymity, Bourbaki was never discussed with the outside world. It was many
years before Bourbaki members agreed to speak publicly about their story. The following
table gives the names of some of Bourbaki’s collaborators.
1st generation (founding fathers) 2nd generation (invited after WWII) 3rd generation
H. Cartan J. Dixmier A. Borel
C. Chevalley R. Godement F. Bruhat
J. Delsarte S. Eilenberg P. Cartier
J. Dieudonné J.L. Koszul A. Grothendieck
A. Weil P. Samuel S. Lang
J.P Serre J. Tate
L. Shwartz
94
Bourbaki congress 1938, from left to right: S. Weil, C. Pisot, A. Weil, J. Dieudonné, C.
Chabauty, C. Ehresmann, J. Delsarte.
The Books
The Bourbaki books were the first to have such a tight organization, the first to use an
axiomatic presentation. They tried as often as possible to start from the general and work
towards the particular. Working with the belief that mathematics are fundamentally sim-
ple and for each mathematical question there is an optimal way of answering it. This
required extremely rigid structure and notation. In fact the first six books of “éléments de
mathématique” use a completely linearly-ordered reference system. That is, any reference
at a given spot can only be to something earlier in the text or in an earlier book. This
did not please all of its readers as Borel elaborates : “I was rather put off by the very dry
style, without any concession to the reader, the apparent striving for the utmost generality,
the inflexible system of internal references and the total absence of outside ones”. However,
Bourbaki’s style was in fact so efficient that a lot of its notation and vocabulary is still
in current usage. Weil recalls that his grandaughter was impressed when she learned that
he had been personally responsible for the symbol ∅ for the empty set,[WA] and Chevalley
explains that to “bourbakise” now means to take a text that is considered screwed up and
to arrange it and improve it. Concluding that “it is the notion of structure which is truly
bourbakique”.[GD]
As well as ∅, Bourbaki is responsible for the introduction of the ⇒ (the implication arrow),
N, R, C, Q and Z (respectively the natural, real, complex, rational numbers and the integers)
CA (complement of a set A), as well as the words bijective, surjective and injective.[DR]
The Decline
Once Bourbaki had finally finished its first six books, the obvious question was “what next?”.
The founding members who (not intentionally) had often carried most of the weight were now
approaching mandatory retirement age. The group had to start looking at more specialized
topics, having covered the basics in their first books. But was the highly structured Bourbaki
style the best way to approach these topics? The motto “everyone must be interested in
everything” was becoming much more difficult to enforce. (It was easy for the first six
books whose contents are considered essential knowledge of most mathematicians) Pierre
Cartier was working with Bourbaki at this point. He says “in the forties you can say that
Bourbaki know where to go: his goal was to provide the foundation for mathematics”.[12] It
seemed now that they did not know where to go. Nevertheless, Bourbaki kept publishing.
Its second series (falling short of Dieudonné’s plan of 27 books encompassing most of modern
mathematics [BA]) consisted of two very successful books :
However Cartier claims that by the end of the seventies, Bourbaki’s method was understood,
and many textbooks were being written in its style : “Bourbaki was left without a task. (...)
95
With their rigid format they were finding it extremely difficult to incorporate new mathe-
matical developments”[SM] To add to its difficulties, Bourbaki was now becoming involved
in a battle with its publishing company over royalties and translation rights. The matter was
settled in 1980 after a “long and unpleasant” legal process, where, as one Bourbaki member
put it “both parties lost and the lawyer got rich”[SM]. In 1983 Bourbaki published its last
volume : IX Spectral Theory.
By that time Cartier says Bourbaki was a dinosaur, the head too far away from the tail.
Explaining : “when Dieudonné was the “scribe of Bourbaki” every printed word came from
his pen. With his fantastic memory he knew every single word. You could say “Dieudonné
what is the result about so and so?” and he would go to the shelf and take down the book and
open it to the right page. After Dieudonné retired no one was able to do this. So Bourbaki
lost awareness of his own body, the 40 published volumes.”[SM] Now after almost twenty
years without a significant publication is it safe to say the dinosaur has become extinct?1
But since Nicolas Bourbaki never in fact existed, and was nothing but a clever teaching and
research ploy, could he ever be said to be extinct?
REFERENCES
[BL] L. BEAULIEU: A Parisian Café and Ten Proto-Bourbaki Meetings (1934-1935), The Mathe-
matical Intelligencer Vol.15 No.1 1993, pp 27-35.
[BCCC] A. BOREL, P.CARTIER, K. CHANDRASKHARAN, S. CHERN, S. IYANAGA: André
Weil (1906-1998), Notices of the AMS Vol.46 No.4 1999, pp 440-447.
[BA] A. BOREL: Twenty-Five Years with Nicolas Bourbaki, 1949-1973, Notices of the AMS Vol.45
No.3 1998, pp 373-380.
[BN] N. BOURBAKI: Théorie des Ensembles, de la collection éléments de Mathématique, Her-
mann, Paris 1970.
[BW] Bourbaki website: [online] at www.bourbaki.ens.fr.
[CH] H. CARTAN: André Weil:Memories of a Long Friendship, Notices of the AMS Vol.46 No.6
1999, pp 633-636.
[DR] R. DéCAMPS: Qui est Nicolas Bourbaki?, [online] at http://faq.maths.free.fr.
[DJ] J. DIEUDONNé: The Work of Nicholas Bourbaki, American Math. Monthly 77,1970, pp134-
145.
[EY] Encylopédie Yahoo: Nicolas Bourbaki, [online] at http://fr.encylopedia.yahoo.com.
[GD] D. GUEDJ: Nicholas Bourbaki, Collective Mathematician: An Interview with Claude Cheval-
ley, The Mathematical Intelligencer Vol.7 No.2 1985, pp18-22.
[JA] A. JACKSON: Interview with Henri Cartan, Notices of the AMS Vol.46 No.7 1999, pp782-788.
[SM] M. SENECHAL: The Continuing Silence of Bourbaki- An Interview with Pierre Cartier, The
Mathematical Intelligencer, No.1 1998, pp 22-28.
[WA] A. WEIL: The Apprenticeship of a Mathematician, Birkhäuser Verlag 1992, pp 93-122.
1
Today what remains is “L’Association des Collaborteurs de Nicolas Bourbaki” who organize Bourbaki
seminars three times a year. These are international conferences, hosting over 200 mathematicians who come
to listen to presentations on topics chosen by Bourbaki (or the A.C.N.B). Their last publication was in 1998,
chapter 10 of book VI commutative algebra.
96
Version: 6 Owner: Daume Author(s): Daume
A low Erdös number is a status symbol among 20th Century mathematicians and is similar
to the 6-degrees-of-separation concept.
• min{e(x)|x ∈ X} + 1, where X is the set of all persons you have authored a paper
with.
97
Chapter 11
The Burali-Forti paradox demonstrates that the class of all ordinals is not a set. If there
were a set of all ordinals, Ord, then it would follow that Ord was itself an ordinal, and
therefore that Ord ∈ Ord. Even if sets in general are allowed to contain themselves, ordinals
cannot since they are defined so that ∈ is well founded over them.
This paradox is similar to both Russel’s paradox and Cantor’s paradox, although it predates
both. All of these paradoxes prove that a certain object is ”too large” to be a set.
The key part of the argument strongly resembles Russell’s paradox, which is in some sense
a generalization of this paradox.
98
Besides allowing an unbounded number of cardinalities as ZF set theory does, this paradox
could be avoided by a few other tricks, for instance by not allowing the construction of a
power set or by adopting paraconsistent logic.
Suppose that for any coherent proposition P (x), we can construct a set {x : P (x)}. Let
S = {x : x 6∈ x}. Suppose S ∈ S; then, by definition, S 6∈ S. Likewise, if S 6∈ S, then by
definition S ∈ S. Therefore, we have a contradiction. Bertrand Russell gave this paradox as
an example of how a purely intuitive set theory can be inconsistent. The regularity axiom,
one of the Zermelo-Fraenkel axioms, was devised to avoid this paradox by prohibiting self-
swallowing sets.
An interpretation of Russell paradox without any formal language of set theory could be
stated like “If the barber shaves all those who do not themselves shave, does he shave him-
self?”. If you answer himself that is false since he only shaves all those who do not themselves
shave. If you answer someone else that is also false because he shaves all those who do not
themselves shave and in this case he is part of that set since he does not shave himself.
Therefore we have a contradiction.
11.4 biconditional
A biconditional is a truth function that is true only in the case that both parameters are true
or both are false. For example, ”a only if b”, ”a just in case b”, as well as ”b implies a and a
implies b” are all ways of stating a biconditional in english. Symbolically the biconditional
is written as
a↔b
or
a⇔b
99
a b a↔b
F F T
F T F
T F F
T T T
In addition, the biconditional function is sometimes written as ”iff”, meaning ”if and only
if”.
The biconditional gets its name from the fact that it is really two conditionals in conjunction,
(a → b) ∧ (b → a)
This fact is important to recognize when writing a mathematical proof, as both conditionals
must be proven independently.
11.5 bijection
Let X and Y be sets. A function f : X → Y that is one-to-one and onto is called a bijection
or bijective function from X to Y .
For any sets A and B, the cartesian product A × B is the set consisting of all ordered pairs
(a, b) where a ∈ A and b ∈ B.
11.7 chain
100
11.8 characteristic function
Properties
Remarks
REFERENCES
1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed,
John Wiley & Sons, Inc., 1999.
A collection of circles is said to be concentric if they have the same center. The region formed
between two concentric circles is therefore an annulus.
101
11.10 conjunction
A conjunction is true only when both parameters (called conjuncts) are true. In English, con-
junction is denoted by the word ”and”. Symbolically, we represent it as ∧ or multiplication
applied to Boolean parameters. Conjunction of a and b would be written
a∧b
a b a∧b
F F F
F T F
T F F
T T T
11.11 disjoint
T
Two sets X and Y are disjoint if their intersection X Y is the empty set.
An empty set ∅ is a set that contains no elements. The Zermelo-Fraenkel axioms of set theory
postulate that there exists an empty set.
102
11.13 even number
The concept of even and odd numbers are most easily understood in the binary base. Then
the above definition simply states that even numbers end with a 0, and odd numbers end
with a 1.
Properties
1. Every integer is either even or odd. This can be proven using induction, or using the
fundamental theorem of arithmetic.
2. An integer k is even (odd) if and only if k 2 is even (odd).
11.15 infinite
A set S is infinite if it is not finite; that is, there is no n ∈ N for which there is a bijection
between n and S. Hence an infinite set has a cardinality greater than any natural number:
|S| ≥ ℵ0
Infinite sets can be divided into countable and uncountable. For countably infinite sets S,
there is a bijection between S and N. This is not the case for uncountably infinite sets (like
the reals and any non-trivial real interval).
103
• The empty set: {}.
• {0, 1}
• {1, 2, 3, 4, 5}
• {1, 1.5, e, π}
• {1, 2, 3, 4, . . .} (countable)
11.17 integer
The set of integers, denoted by the symbol Z, is the set {· · · − 3, −2, −1, 0, 1, 2, 3, . . . } con-
sisting of the natural numbers and their negatives.
• (a, b) + (c, d) := (a + c, b + d)
104
Typically, the class of (a, b) is denoted by symbol n if b 6 a (resp. −n if a 6 b), where n is
the unique natural number such that a = b + n (resp. a + n = b). Under this notation, we
recover the familiar representation of the integers as {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }. Here
are some examples:
The set of integers Z under the addition and multiplication operations defined above form
an integral domain. The integers admit the following ordering relation making Z into an
ordered ring: (a, b) 6 (c, d) in Z if a + d 6 b + c in N.
The ring of integers is also a Euclidean domain, with valuation given by the absolute value
function.
f −1 ◦ f = idX ,
f ◦ f −1 = idY .
Remarks
3. The inverse function and the inverse image of a set coincide in the following sense.
Suppose f −1 (A) is the inverse image of a set A ⊂ Y under a function f : X → Y . If f
is a bijection, then f −1 (y) = f −1 ({y}).
105
11.19 linearly ordered
An ordering ≤ (or <) of A is called linear or total if any two elements of A are comparable.
The pair (A, ≤) is then called a linearly ordered set.
11.20 operator
Synonym of mapping and function. Often used to refer to mappings where the domain and
codomain are, in some sense a space of functions.
For any sets a and b, the ordered pair (a, b) is the set {{a}, {a, b}}.
and the above construction of ordered pair, as weird as it seems, is actually the simplest
possible formulation which achieves this property.
• Either a 6 b, or b 6 a,
• If a 6 b and b 6 c, then a 6 c,
• If a 6 b and b 6 a, then a = b.
106
Given an ordering relation 6, one can define a relation < by: a < b if a 6 b and a 6= b. The
opposite ordering is the relation > given by: a > b if b 6 a, and the relation > is defined
analogously.
11.23 partition
S
A partition P of a set S is a collection of mutually disjoint non-empty sets such that P = S.
11.24 pullback
f : Y → Z,
Φ : X → Y.
Φ∗ f : X → Z,
x 7→ (f ◦ Φ)(x).
Let us denote by M(X, Y ) the set of all mappings f : X → Y . We then see that Φ∗ is a
mapping M(Y, Z) → M(X, Z). In other words, Φ∗ pulls back the set where f is defined on
from Y to X. This is illustrated in the below diagram.
Φ
X Y
f
Φ∗ f
Z
Properties
107
2. Suppose we have maps
Φ : X → Y,
Ψ:Y → Z
4. Suppose X, Y are sets with X ⊂ Y . Then we have the inclusion map ι : X ,→ Y , and
for any f : Y → Z, we have
ι∗ f = f |X ,
where f |X is the restriction of f to X [1].
REFERENCES
1. W. Aitken, De Rahm Cohomology: Summary of Lectures 1-4, online.
The above definition has no relation with the definition of a closed set in topology. Instead,
one should think of X and L as a closed system.
Examples
1. The set of invertible matrices is closed under matrix inversion. This means that the
inverse of an invertible matrix is again an invertible matrix.
2. Let C(X) be the set of complex valued continuous functions on some topological space
X. Suppose f, g are functions in C(X). Then we define the pointwise product of f
and g as the function f g : x 7→ f (x)g(x). Since f g is continuous, we have that C(X)
is closed under pointwise multiplication.
108
In the first examples, the operations is of the type X → X. In the latter, pointwise multi-
plication is a map C(X) × C(X) → C(X).
Let X be a finite set, and let G be the group of permutations of X (see permutation group).
There exists a unique homomorphism χ from G to the multiplicative group {−1, 1} such
that χ(t) = −1 for any transposition (loc. sit.) t ∈ G. The value χ(g), for any g ∈ G,
is called the signature or sign of the permutation g. If χ(g) = 1, g is said to be of even
parity; if χ(g) = −1, g is said to be of odd parity.
Proof: This is clear if g is the identity map X → X. If g is any other permutation, then for
some consecutive a, b ∈ X we have a < b and g(a) > g(b). Let h ∈ G be the transposition
of a and b. We have
k(h ◦ g) = k(g) − 1
χ(h ◦ g) = −χ(g)
and the proposition follows by induction on k(g).
11.27 subset
Some examples: The set A = {d, r, i, t, o} is a subset of the set B = {p, e, d, r, i, t, o} because
every element of A is also in B. That is, A ⊆ B.
109
If X ⊆ Y and Y ⊆ X, it must be the case that X = Y .
Every set is a subset of itself, and the empty set is a subset of every other set. The set A is
called a proper subset of B, if A ⊂ B and A 6= B (in this case we do not use A ⊆ B).
11.28 surjective
Imf = Y.
11.29 transposition
σ(a) = a
σ(b) = e
σ(c) = c
σ(d) = d
σ(e) = b
is a transposition.
One of the main results on symmetric groups states that any permutation can be expressed
as composition of transpositions, and for any two decompositions of a given permutation,
the number of transpositions is always even or always odd.
110
11.30 truth table
A truth table is a tabular listing of all possible input value combinations for a truth function
and their corresponding output values. For n input variables, there will always be 2n rows
in the truth table. A sample truth table for (a ∧ b) → c would be
a b c (a ∧ b) → c
F F F T
F F T F
F T F T
F T T F
T F F T
T F T F
T T F T
T T T T
(Note that ∧ represents logical and, while → represents the conditional truth function).
111
Chapter 12
112
Chapter 13
13.1 CNF
You can see that the contrapositive of an implication is true by considering the following:
113
13.3 contrapositive
An implication and its contrapositive are equivalent statements. When proving a theorem,
it is often more convenient or more intuitive to prove the contrapositive instead.
13.4 disjunction
A disjunction is true if either of its parameters (called disjuncts) are true. Disjunction
does not correspond to ”or” in English (see exclusive or.) Disjunction uses the symbol ∨
or sometimes + when taken in algebraic context. Hence, disjunction of a and b would be
written
a∨b
or
a+b
The truth table for disjunction is
a b a∨b
F F F
F T T
T F T
T T T
13.5 equivalent
Two statements A and B are said to be (logically) equivalent if A is true if and only if B is
true (that is, A implies B and B implies A). This is usually written as A ⇔ B. For example,
for any integer z, the statement ”z is positive” is equivalent to ”z is not negative and z 6= 0”.
114
13.6 implication
An implication is a logical construction that essentially tells us if one condition is true, then
another condition must be also true. Formally it is written
a→b
or
a⇒b
which would be read ”a implies b”, or ”a therefore b”, or ”if a, then b” (to name a few).
Implication is often confused for ”if and only if”, or the biconditional truth function (⇔).
They are not, however, the same. The implication a → b is true even if only b is true. So
the statement ”pigs have wings, therefore it is raining today”, is true if it is indeed raining,
despite the fact that the first item is false.
It may be useful to remember that a → b only tells you that it cannot be the case that
b is false while a is true; b must ”follow” from a (and “false” does follow from “false”).
Alternatively, a → b is in fact equivalent to
b ∨ ¬a
a b a→b
F F T
F T T
T F F
T T T
A propositional logic is a logic in which the only objects are propositions, that is,
objects which themselves have truth values. Variables represent propositions, and there are
no relations, functions, or quantifiers except for the constants T and ⊥ (representing true
115
and false respectively). The connectives are typically ¬, ∧, ∨, and → (representing negation,
conjunction, disjunction, and implication), however this set is redundant, and other choices
can be used (T and ⊥ can also be considered 0-ary connectives).
A model for propositional logic is just a truth function ν on a set of variables. Such a truth
function can be easily extended to a truth function ν on all formulas which contain only the
variables ν is defined on by adding recursive clauses for the usual definitions of connectives.
For instance ν(α ∧ β) = T iff ν(α) = ν(β) = T .
Then we say ν |= φ if ν(φ) = T , and we say |= φ if for every ν such that ν(φ) is defined,
ν |= φ (and say that φ is a tautology).
13.8 theory
If L is a logical language for some logic L, and T is a set of formulas with no free variables
then T is a theory of L.
13.9 transitive
(a ⇒ b) ∧ (b ⇒ c) ⇒ (a ⇒ c)
Where ⇒ is the conditional truth function. From this we can derive that
(a = b) ∧ (b = c) ⇒ (a = c)
116
Version: 1 Owner: akrowne Author(s): akrowne
A truth function is a function that returns one of two values, one of which is interpreted
as ”true”, and the other which is interpreted as ”false”. Typically either ”T” and ”F” are
used, or ”1” and ”0”, respectively. Using the latter, we can write
defines a truth function f . That is, f is a mapping from any number (n) of true/false (0 or
1) values to a single value, which is 0 or 1.
117
Chapter 14
14.1 ∆1 bootstrapping
This proves that a number of useful relations and functions are ∆1 in first order arithmetic,
providing a bootstrapping of parts of mathematical practice into any system including the ∆1
relations (since the ∆1 relations are exactly the recursive ones, this includes Turing machines).
First, we want to build a tupling relation which will allow a finite sets of numbers to be
encoded by a single number. To do this we first show that R(a, b) ↔ a|b is ∆1 . This is true
since a|b ↔ ∃c 6 b(a · c = b), a formula with only bounded quantifiers.
Next note that P (x) ↔x is prime is ∆1 since P (x) ↔ ¬∃y < x(¬y = 1 ∧ y|x). Also
AP (x, y) ↔ P (x) ∧ P (y) ∧ ∀z < y(x < z → ¬P (x)).
These two can be used to define (the graph of) a primality function, p(a) = a + 1-th prime.
2
Let p(a, b) = ∃c 6 ba (¬[2|c] ∧ [∀q < b∀r 6 b(AP (q, r) → ∀j < c[q j |c ↔ r j+1 |c])] ∧ [ba |c] ∧
¬[ba+1 |c]).
This rather awkward looking formula is worth examining, since it illustrates a principle which
will be used repeatedly. c is intended to be a function of the form 20 · 31 · 52 · · · and so on. If
it includes ba but not ba+1 then we know that b must be the a + 1-th prime. The definition
is so complicated because we cannot just say, as we’d like to, p(a + 1) is the smallest prime
greater than p(a) (since we don’t allow recursive definitions). Instead we embed the series
of values this recursion would take into a single number (c) and guarantee that the recursive
relationship holds for at least a terms; then we just check if the a-th value is b.
Finally, we can define our tupling relation. Technically, sincePa given relation must have
a fixed arity, we define for each n a function hx0 , . . . , xn i = 6n pxi i +1 . Then define (x)i
to be the i-th element of x when x is interpreted as a tuple, so h(x)0 , . . . , (x)n i = x. Note
that the tupling relation, even taken collectively, is not total. For instance 5 is not a tuple
(although it is sometimes convenient to view it as a tuple with ”empty spaces”: h , , 5i). In
118
situations like this, and also when attempting to extract entries beyond the length, (x)i = 0
(for instance, (5)0 = 0). On the other hand there is a 0-ary tupling relation, hi = 1.
For the reverse, (x)i = y ↔ ([p(i)y+1 |x] ∧ ¬[p(i)y+2 |x]) ∨ ([y = 0] ∧ ¬[p(i)|x]).
Also, define a length function by len(x) = y ↔ ¬[p(y + 1)|x] ∧ ∀z 6 y[p(z)|x] and a mem-
bership relation by in(x, n) ↔ ∃i < len(x)[(x)i = n].
Armed with this, we can show that all primitive recursive functions are ∆1 . To see this, note
that x = 0, the zero function, is trivially recursive, as are x = Sy and pn,m (x1 , . . . , xn ) = xm .
The ∆1 functions are closed under composition, since if φ(~x) and ψ(~x) both have no unbounded
quantifiers, φ(ψ(~x)) obviously doesn’t either.
Finally, suppose we have functions f (~x) and g(~x, m, n) in ∆1 . Then define the primitive
recursion h(~x, y) by first defining:
h̄(~x, y) = z ↔ ln(z) = y ∧ ∀i < y[(z)i+1 = g(~x, i, (z)i )] ∧ [ln(z) = 0 ∨ (z)0 = f (~x)
∆1 is also closed under minimization: if R(~x, y) is a ∆1 relation then µy.f (~x, y) is a function
giving the least y satisfying R(~x, y). To see this, note that µy.f (~x, y) = z ↔ f (~x, z) ∧ ∀m <
z¬f (~x, m).
We can also define ∗u , which concatenates only elements of t not appearing in s. This just
requires defining the graph of g to be g(s, t, j, i, x) ↔ [in(s, (t)i) ∧ x = j] ∨ [¬ in(s, (t)i ) ∧ x =
j ∗1 (t)i ]
14.2 Boolean
Boolean refers to that which can take on the values ”true” or ”false”, or that which concerns
truth and falsity. For example ”Boolean variable”, ”Boolean logic”, ”Boolean statement”,
etc.
119
”Boolean” is named for George Boole, the 19th century mathematician.
A Gödel numbering is any way off assigning numbers to the formulas of a language. This is
often useful in allowing sentences of a language to be self-referential. The number associated
with a formula φ is called its Gödel number and is denoted pφq.
More formally, if L is a language and G is a surjective partial function from the terms of
L to the formulas over L then G is a Gödel numbering. pφq may be any term t such that
G(t) = φ. Note that G is not defined within L (there is no formula or object of L representing
G), however properties of it (such as being in the domain of G, being a subformula, and so
on) are.
Athough anything meeting the properties above is a Gödel numbering, depending on the
specific language and usage, any of the following properties may also be desired (and can
often be found if more effort is put into the numbering):
Gödel’s first and second incompleteness theorems are perhaps the most celebrated results
in mathematical logic. The basic idea behind Gödel’s proofs is that by the device of
Gödel numbering, one can formulate properties of theories and sentences as arithmetical
properties of the corresponding Gödel numbers, thus allowing 1st order arithmetic to speak
of its own consistency, provability of some sentence and so forth.
The original result Gödel proved in his classic paper On Formally Undecidable propositions
in Principia Mathematica and Related Systems can be stated as
Theorem 1. No theory T axiomatisable in the type system of PM (i.e. in Russell’s theory of types)
which contains Peano-arithmetic and is ω-consistent proves all true theorems of arithmetic
(and no false ones).
120
Stated this way, the theorem is an obvious corollary of Tarski’s result on the undefinability of Truth.
This can be seen as follows. Consider a Gödel numbering G, which assigns to each formula
φ its Gödel number d φe . The set of Gödel numbers of all true sentences of arithmetic is
{d φe | N |= φ}, and by Tarski’s result it isn’t definable by any arithmetic formula. But
assume there’s a theory T an axiomatisation AxT of which is definable in arithmetic and
which proves all true statements of arithmetic. But now ∃P (P is a proof of x from AxT )
defines the set of (Gödel numbers of) true sentences of arithmetic, which contradicts Tarski’s
result.
The proof given above is highly non-constructive. A much stronger version can actually be
extracted from Gödel’s paper, namely that
Theorem 2. There is a primitive recursive function G, s.t. if T is a theory with a p.r.
axiomatisation α, and if all primitive recursive functions are representable in T then N |=
G(α) but T 6` G(α)
This second form of the theorem is the one usually proved, although the theorem is usually
stated in a form for which the nonconstructive proof based on Tarski’s result would suffice.
The proof for this stronger version is based on a similar idea as Tarski’s result.
Consider the formula ∃P (P is a proof of x from α), which defines a predicate P rovα (x)
which represents provability from α. Assume we have numerated the open formulae with
one variable in a sequence Bi , so that every open formula occurs. Consider now the sentence
¬Prov(Bx ), which defines the non-provability from α predicate. Now, since ¬Prov(Bx ) is
an open formula with one variable, it must be Bk for some k. Thus we can consider the
closed sentence Bk (k). This sentence is equivalent to ¬Prov(subst(d ¬P rovα (x)e ), k)), but
since subst(d ¬Prov(x)e ), k) is just Bk (k), it ”asserts its own unprovability”.
Since all the steps we took to get the undecided but true sentence subst(d ¬Prov(x)e ), k)
is just Bk (k) were very simple mechanic manipulations of Gödel numbers guaranteed to
terminate in bounded time, we have in fact produced the p.r. Function G required by the
statement of the theorem.
The first version of the proof can be used to show that also many non-axiomatisable theories
are incomplete. For example, consider P A + all true Π1 sentences. Since Π1 truth is
definable at Π2 -level, this theory is definable in arithmetic by a formula α. However, it’s not
complete, since otherwise ∃p(p is a proof of x from α) would be the set of true sentences
of arithmetic. This can be extended to show that no arithmetically definable theory with
sufficient expressive power is complete.
The second version of Gödel’s first incompleteness theorem suggests a natural way to extend
theories to stronger theories which are exactly as sound as the original theories. This sort
of process has been studied by Turing, Feferman, Fenstad and others under the names of
ordinal logics and transfinite recursive progressions of arithmetical theories.
Gödel’s second incompleteness theorem concerns what a theory can prove about its own
provability predicate, in particular whether it can prove that no contradiction is provable.
121
The answer under very general settings is that a theory can’t prove that it is consistent,
without actually being inconsistent.
The second incompleteness theorem is best presented by means of a provability logic. Con-
sider an arithmetic theory T which is p.r. axiomatised by α. We extend the language this
theory is expressed in with a new sentence forming operator 2, so that any sentence in
parentheses prefixed by 2 is a sentence. Thus for example, 2(0 = 1) is a formula. Intu-
itively, we want 2(φ) to express the provability of φ from α. Thus the semantics of our new
language is exactly the same as that of the original language, with the additional rule that
2(φ) is true if and only if α ` φ. There is a slight difficulty here; φ might itself contain
boxed expressions, and we haven’t yet provided any semantics for these. The answer is sim-
ple, whenever a boxed expression 2(ψ) occurs within the scope of another box, we replace
it with the arithmetical statement P rovα (ψ). Thus for example the truth of 2(2(0 = 1))
is equivalent to α ` P rovα (d 0 = 1e ). Assuming that α is strong enough to prove all true
instances of Prov(d φe ) we can in fact interprete the whole of the new boxed language by
the translation. This is what we shall do, so formally α ` φ (where φ might contain boxed
sentences) is taken to mean α ` φ∗ where φ∗ is obtained by replacing the boxed expressions
with arithmetical formulae as above.
There are a number of restrictions we must impose on α (and thus on 2, the meaning of
which is determined by α). These are known as Hilbert-Bernays derivability conditions and
they are as follows
• if α ` φ then α ` 2(φ)
• α ` 2(φ) → 22(φ)
• α ` 2(φ → ψ) → (2φ → 2ψ)
A statement Cons asserts the consistency of α if its equivalent to ¬2(0 = 1). Gödel’s first
incompleteness theorem shows that there is a sentence Bk (k) for which the following is true
¬2(0 = 1) → 3(Bk (k)) ∧ 3(¬(Bk (k)), where 3 is the dual of 2, i.e. 3(φ) ↔ ¬2(¬φ).
A careful analysis reveals that this is provable in any α which satisfied the derivability
conditions, i.e. α ` ¬2(0 = 1) → 3(Bk (k)) ∧ 3(¬(Bk (k)). Assume now that α can prove
¬2(0 = 1), i.e. that α can prove its own consistency. Then α can prove 3(Bk (k)) ∧
3(¬(Bk (k)). But this means that α can prove Bk (k)! Thus α is inconsistent.
Let L be a first order language. We define the equivalence relation ∼ over formulas of L by
ϕ ∼ ψ if and only if ` ϕ ⇔ ψ. Let B = L/ ∼ be the set of equivalence classes. We define
the operations ⊕ and · and complementation, denoted [ϕ] on B by :
122
[ϕ] ⊕ [ψ] = [ϕ ∨ ψ]
[ϕ] · [ψ] = [ϕ ∧ ψ]
[ϕ] = [¬ϕ]
We let 0 = [ϕ∧¬ϕ] and 1 = [ϕ∨¬ϕ]. Then the structure (B, ⊕, ·,¯, 0, 1) is a Boolean algebra,
called the Lindenbaum algebra.
Note that it may possible to define the Lindenbaum algebra on extensions of first order logic,
as long as there is a notion of formal proof that can allow the definition of the equivalence
relation.
One of the very first results of the study of model theoretic logics is a characterisation
theorem due to Per Lindström. He showed that the classical first order logic is the strongest
logic having the following properties
• compactness
• Löwenheim-Skolem theorem
also, he showed that first order logic can be characterised as the strongest logic for which
the following hold
• Löwenheim-Skolem theorem
The notion of “strength” used here is as follows. A logic L0 is stronger than L or as strong
if the class of sets definable in L ⊆ the class of sets definable in L0 .
123
14.7 Pressburger arithmetic
Pressburger arithmetic is decideable, but is consequently very limited in what it can express.
14.9 Skolemization
When the existential quantifier is inside a universal quantifier, the bound variable must
be replaced by a Skolem function of the variables bound by universal quantifiers. Thus
∀x[x = 0 ∨ ∃y[x = y + 1]] becomes ∀x[x = 0 ∨ x = f (x) + 1].
This is used in second order logic to move all existential quantifiers outside the scope of first
order universal quantifiers. This can be done since second order quantifiers can quantify over
functions. For instance ∀1 x∀1 y∃1 zφ(x, y, z) is equivalent to ∃2 F ∀1 x∀1 yφ(x, y, F (x, y)).
The first level consists of formulas with only bounded quantifiers, the corresponding relations
124
are also called the primitive recursive relations (this definition is equivalent to the definition
from computer science). This level is called any of ∆00 , Σ00 and Π00 , depending on context.
A formula φ is Σ0n if there is some ∆01 formula ψ such that φ can be written:
The Σ01 relations are the same as the Recursively Enumerable relations.
A formula is ∆0n if it is both Σ0n and Π0n . Since each Σ0n formula is just the negation of a Π0n
formula and vice-versa, the Σ0n relations are the complements of the Π0n relations.
T
The relations in ∆01 = Σ01 Π01 are the Recursive relations.
Higher levels on the hierarchy correspond to broader and broader classes of relations. A for-
mula or relation which is Σ0n (or, equivalently, Π0n ) for some integer n is called arithmetical.
The superscript 0 is often omitted when it is not necessary to distinguish from the analytic hierarchy.
Functions can be described as being in one of the levels of the hierarchy if the graph of the
function is in that level.
More significant is the proof that all containments are proper. First, let n > 1 and U be
universal for 2-ary Σn relations. Then D(x) ↔ U(x, x) is obviously Σn . But suppose D ∈ ∆n .
Then D ∈ P in , so ¬D ∈ Σn . Since U is universal, ther is some e such that ¬D(x) ↔ U(e, x),
and therefore ¬D(e) ↔ U(e, e) ↔ ¬U(e, e). This is clearly a contradiction, so D ∈ Σn \ ∆n
and ¬D ∈ Πn \ ∆n .
125
In addition the recursive join of D and ¬D, defined by
Clearly both D and ¬D can be recovered from D ⊕ ¬D, so it is contained in neither Σn nor
Πn . However the definition aboveShas only unbounded quantifiers except for those in D and
¬D, so D ⊕ ¬D(x) ∈ ∆n+1 \ Σn Πn
Let L be a first order language, and suppose it has signature Σ. A formula ϕ of L is said to
be atomic if and only if :
From the syntactic compactness theorem for first order logic, we get this nice (and useful)
result:
Let T be a theory of first-order logic. If T has finite models of unboundedly large sizes, then
T also has an infinite model.
(Φn says “there exist (at least) n different elements in the world ”). Note that . . . ` Φn `
. . . ` Φ2 ` Φ1 . Define a new theory
[
T∞ = T {Φ1 , Φ2 , . . .} .
For any finite subset T 0 ⊂ T∞ , we claim that T 0 is consistent: Indeed, T 0 contains axioms of
T, along with finitely many of {Φn }n≥1 . Let Φm correspond to the largest index appearing in
T 0 . If Mm |= T is a model
S of T with at least m elements (and by hypothesis, such as model
0
exists), then Mm |= T {Φm } ` T .
126
So every finite subset of T∞ is consistent; by the compactness theorem for first-order logic,
T∞ is consistent, and by Gödel’s completeness theorem for first-order logic it has a model
M. Then M |= T∞ ` T, so M is a model of T with infinitely many elements (M |= Φn for
any n, so M has at least ≥ n elements for all n).
Using the example of Gödel numbering, we can show that Proves(a, x) (the statement that
a is a proof of x, which will be formally defined below) is ∆1 .
First, Term(x) should be true iff x is the Gödel number of a term. Thanks to primitive
recursion, we can define it by:
Then AtForm(x), which is true when x is the Gödel number of an atomic formula, is defined
by:
Next, Form(x), which is true only if x is the Gödel number of a formula, is defined recursively
127
by:
Form(x) ↔ AtForm(x)∨
∃i, y < x[x = h2, i, yi ∧ Form(y)]∨
∃y < x[x = h3, yi ∧ Form(y)]∨
∃y, z < x[x = h4, y, zi ∧ Form(y) ∧ Form(z)]
The definition of QFForm(x), which is true when x is the Gödel numbe of a quantifier free formula,
is defined the same way except without the second clause.
Next we want to show that the set of logical tautologies is ∆1 . This will be done by formal-
izing the concept of truth tables, which will require some development. First we show that
AtForms(a), which is a sequence containing the (unique) atomic formulas of a is ∆1 . Define
it by:
We say v is a truth assignment if it is a sequence of pairs with the first member of each
pair being a atomic formula and the second being either 1 or 0:
Then v is a truth assignment for a if v is a truth assignment, a is quantifier free, and every
atomic formula in a is the first member of one of the pairs in v. That is:
128
Then a is a tautology if every truth assignment makes it true:
AtForms(a)
Taut(a)∀v < 22 [T Af (v, a) → True(v, a)]
• there are some j, k < i such that (a)j = h4, (a)k , (a)i i (that is, (a)i is a conclusion
under modus ponens from (a)j and (a)k ).
Proves(a, x) ↔ ∀i < len(a)[Ax((a)i ) ∨ ∃j, k < i[(a)j = h4, (a)k , (a)i i] ∨ Taut((a)i )]
We can define by recursion a function e from formulas of arithmetic to numbers, and the
corresponding Gödel numbering as the inverse.
• e(vi ) = h0, ii
129
• e(¬φ) = h3, e(φ)i
• e(φ → ψ) = h4, e(φ), e(ψ)i
• e(0) = h5i
• e(Sφ) = h6, e(φ)i
As an example of the use of Well-founded induction in the case where the order is not a
linear one, I’ll prove the fundamental theorem of arithmetic : every natural number has a
prime factorization.
First note that the division relation is well-founded. This fact is proven in every algebra
books. The |-minimal elements are the prime numbers. We detail the two steps of the proof
:
1. If n is prime, then n is its own factorization into primes, so the assertion is true for
the |-minimal elements.
2. If n is not prime, then n has a non-trivial factorization (by definition of not being
prime), i.e. n = m`, where m, n 6= 1. By induction, m and ` have prime factorizations,
and we can see that this implies that n has one too. This takes care of case 2.
4. ordinal numbers;
130
5. etc.
Terms and formulas of first order logic are constructed with the classical logical symbols ∀,∃,
∧, ∨, ¬, ⇒, ⇔, and also ( and ), and a set
! !
[ [ [ [
Σ= Reln Funn Const
n∈ω n∈ω
We require that all these sets be disjoint. The elements of the set Σ are the only non-logical
symbols that we are allowed to use when we construct terms and formulas. They form the
signature of the language. So far they are only symbols, so they don’t mean anything. For
most structures that we encounter, the set Σ is finite, but we allow it to be infinite, even
uncountable, as this sometimes makes things easier, and just about everything still works
when the signature is uncountable. We also assume that we have an unlimited supply of
variables, with the only constraint that the collection of variables form a set, which should
be disjoint from the other sets of non-logical symbols.
The arity of a function or relation symbol is the number of parameters the symbol is about
to take. It is usually assumed to be a property of the symbol, and it is bad grammar to use
an n-ary function or relation with m parameters if m 6= n.
3. If f is an n-ary function symbol, and t1 , ..., tn are term, then f (t1 , ..., tn ) is a term.
131
With terms in hands, we build formulas inductively by a finite application of the following
rules :
2. If R is an n-ary relation symbol and t1 , ..., tn are terms, then R(t1 , ..., tn ) is a formula;
def def
ϕ ∧ ψ := ¬(¬ϕ ∨ ¬ψ) ϕ ⇒ ψ := ¬ϕ ∨ ψ
def def
ϕ ⇔ ψ := (ϕ ⇒ ψ) ∧ (ψ ⇒ ϕ) ∀x.ϕ := ¬(∃x(¬ϕ))
A logic is first order if it has exactly one type. Usually the term refers specifically to the
logic with connectives ¬, ∨, ∧, →, and ↔ and the quantifiers ∀ and ∃, all given the usual
semantics:
• ∀xφ(x) is true iff φtx is true for every object t (where φtx is the result of replacing every
unbound occurence of x in φ with t)
• φ ↔ ψ is the same as (φ → ψ) ∧ (ψ → φ)
132
However languages with slightly different quantifiers and connectives are sometimes still
called first order as long as there is only one type.
Definition. A theory T is said to be consistent if and only if T 6`⊥, where ⊥ stands for
“false”. In other words, T is consistent if one cannot derive a contradiction from it.S If ϕ is
a sentence of L, then we say ϕ is consistent with T if and only if the theory T {ϕ} is
consistent.
In the entry first-order languages, I have mentioned the use of variables without men-
tioning what variables really are. A variable is a symbol that is supposed to range over the
universe of discourse. Unlike a constant, it has no fixed value.
There are two ways in which a variable can occur in a formula: free or bound. Informally,
a variable is said to occur free in a formula ϕ if and only if it is not within the ”scope” of a
quantifier. For instance, x occurs free in ϕ if and only if it occurs in it as a symbol, and no
subformula of ϕ is of the form ∃x.ψ. Here the x after the ∃ is to be taken literally : it is x
and no other symbol.
The set F V (ϕ) of free variables of ϕ is defined by Well-founded induction on the construction
of formulas. First we define V ar(t), where t is a term, to be the set or all variables occurring
in t, and then :
133
[
F V (t1 = t2 ) = V ar(t2 ) V ar(t2 )
n
[
F V (R(t1 , ..., tn )) = V ar(tk )
k=1
F V (¬ϕ) = F V (ϕ)
[
F V (ϕ ∨ ψ) = F V (ϕ) F V (ψ)
F V (∃x(ϕ)) = F V (ϕ)\{x}
When for some ϕ, the set F V (ϕ) is not empty, then it is customary to write ϕ as ϕ(x1 , ...xn ),
in order to stress the fact that there are some free variables left in ϕ, and that those free
variables are among x1 , ..., xn . When x1 , ..., xn appear free in ϕ, then they are considered as
place-holders, and it is understood that we will have to supply “values” for them, when
we want to determine the truth of ϕ. If F V (ϕ) = ∅, then ϕ is called a sentence.
If a variable never occurs free in ϕ (and occurs as a symbol), then we say the variable is
bound. A variable x is bound if and only if ∃x(ψ) or ∀x(ψ) is a subformula of ϕ for some ψ
The problem with this definition is that a variable can occur both free and bound in the
same formula. For example, consider the following formula of the lenguage {+, ·, 0, 1} of ring
theory :
x + 1 = 0 ∧ ∃x(x + y = 1)
The variable x occurs both free and bound here. However, the following lemma tells us that
we can always avoid this situation :
Lemma 1. It is possible to rename the bound variables without affecting the truth of a
formula. In other words, if ϕ = ∃x(ψ), or ∀x(ψ), and z is a variable not occuring in ψ, then
` ϕ ⇔ ∃z(ψ(z/x)), where ψ(z/x) is the formula obtained from ψ by replacing every free
occurence of x by z.
The underlying principle is that formulas quantified by a generalized quantifier are true if
the set of elements satisfying those formulas belong in some relation associated with the
quantifier.
134
Every generalized quantifier has an arity, which is the number of formulas it takes as argu-
ments, and a type, which for an n-ary quantifier is a tuple of length n. The tuple represents
the number of quantified variables for each argument.
The most common quantifiers are those of type h1i, including ∀ and ∃. If Q is a quantifier
of type h1i, M is the universe of a model, and QM is the relation associated with Q in that
model, then Qxφ(x) ↔ {x ∈ M | φ(x)} ∈ QM .
So ∀M = {M}, since the quantified formula is only true when all elements satisfy it. On the
other hand ∃M = P (M) − {∅}.
In general, the monadic quantifiers are those of type h1, . . . , 1i and if Q is an n-ary monadic
quantifier then QM ⊆ P (M)n . Härtig’s quantifier, for instance, is h1, 1i, and IM = {hX, Y i |
X, Y ⊆ M ∧ |X| = |Y |}.
Y
QM ⊆ P (M)ni
i
14.23 logic
Generally, by logic, people mean first order logic, a formal set of rules for building mathemat-
ical statements out of symbols like ¬ (negation) and → (implication) along with quantifiers
like ∀ (for every) and ∃ (there exists).
More generally, a logic is any set of rules for forming sentences (the logic’s syntax) together
with rules for assigning truth values to them (the logic’s semantics). Normally it includes
a (possibly empty) set of types T (also called sorts), which represent the different kinds of
objects that the theory discusses (typical examples might be sets, numbers, or sets of num-
bers). In addition it specifies particular quantifiers, connectives, and variables. Particular
theories in the logic can then add relations and functions to fully specify a logical language.
135
14.24 proof of compactness theorem for first order logic
The theorem states that if a set of sentences of a first-order language L is inconsistent, then
some finite subset of it is inconsistent. Suppose ∆ ⊆ L is inconsistent. Then by definition
∆ `⊥, i.e. there is a formal proof of “false” using only assumptions from ∆. Formal proofs
are finite objects, so let Γ collect all the formulas of ∆ that are used in the proof.
To prove the transfinite induction theorem, we note that the class of ordinals is well-ordered
by ∈. So suppose for some Φ, there are ordinals α such that Φ(α) is not true. Suppose
further that Φ satisfies the hypothesis, i.e. ∀α(∀β < α(Φ(β)) ⇒ Φ(α)). We will reach a
contradiction.
The class C = {α : ¬Φ(α)} is not empty. Note that it may be a proper class, but this is not
important. Let γ = min(C) be the ∈-minimal element of C. Then by assumption, for every
λ < γ, Φ(λ) is true. Thus, by hypothesis, Φ(γ) is true, contradiction.
This proof is very similar to the proof of the transfinite induction theorem. Suppose Φ is
defined for a well-founded set (S, R), and suppose Φ is not true for every a ∈ S. Assume
further that Φ satisfies requirements 1 and 2 of the statement. Since R is a well-founded
relation, the set {a ∈ S : ¬Φ(a)} has an R minimal element r. This element is either an R
minimal element of S itself, in which case condition 1 is violated, or it has R predessors. In
this case, we have by minimality Φ(s) for every s such that sRr, and by condition 2, Φ(r) is
true, contradiction.
14.27 quantifier
A quantifier is a logical symbol which makes an assertion about the set of values which
make one or more formulas true. This an exceedingly general concept; the vast majority of
mathematics is done with the two standard quantifiers, ∀ and ∃.
136
The universal quantifier ∀ takes a variable and a formula and asserts that the formula
holds for any value of x. A typical example would be a sentence like:
∀x[0 6 x]
The existential quantifier ∃ is the dual; that is the formula ∀xφ(x) is equivalent to
¬∃x¬φ(x). It states that there is some x satsifying the formula, as in
∃x[x > 0]
The scope of a quantifier is the portion of a formula where it binds its variables. Note
that previous bindings of a variable are overridden within the scope of a quantifier. In the
examples above, the scope of the quantifiers was the entire formula, but that need not be
the case. The following is a more complicated use of quantifiers:
†:The scope of the second existential quantifier. Within this area, all references to x refer to
the variable bound by the existential quantifier. It is impossible to refer directly to the one
bound by the universal quantifier.
As that example illustrates, it can be very confusing when one quantifier overrides another.
Since it does not change the meaning of a sentence to change a bound variable and all bound
occurrences of it, it is better form to replace sentences like that with an equivalent but more
readable one like:
These sentences both assert that every number is either equal to zero, or that there is some
number one less than it, and that the number one less than it is also either zero or has
a number one less than it. [Note: This is not the most useful of sentences. It would be
nice to replace this with a mathematically simple sentence which uses nested quantifiers
meaningfully.]
137
The quantifiers may not range over all objects. That is, ∀xφ(x) may not specify that x
can be any object, but rather any object belonging to some class of objects. Similarly
∃xφ(x) may specify that there is some x within that class which satisfies φ. For instance
second order logic has two universal quantifiers, ∀1 and ∀2 (with corresponding existential
quantifiers), and variables bound by them only range over the first and second order objects
respectively. So ∀1 x[0 6 x] only states that all numbers are greater than or equal to 0, not
that sets of numbers are as well (which would be meaningless).
The restriction is often incorporated into the quantifier. For instance the first example might
be written ∀x < c[φ(c)].
A quantifier is called vacuous if the variable it binds does not appear anywhere in its scope,
such as ∀x∃y[0 6 x]. While vacuous quantifiers do not change the meaning of a sentence,
they are occasionally useful in finding an equivalent formula of a specific form.
While these are the most common quantifiers (in particular, they are the only quantifiers
appearing in classical first-order logic), some logics use others. The quantifier ∃!xφ(x), which
means that there is a unique x satsifying φ(x) is equivalent to ∃x[φ(x) ∧ ∀y[φ(y) → x = y]].
Other quantifiers go beyond the usual two. Examples include interpreting Qxφ(x) to mean
there are an infinite (or uncountably infinite) number of x satisfying φ(x). More elaborate
examples include the branching Henkin quantifier, written:
∀x ∃y
φ(x, y, a, b)
∀a ∃b
This quantifier is similar to ∀x∃y∀a∃bφ(x, y, a, b) except that the choice of a and b cannot
depend on the values of a and b. This concept can be further generalized to the game-
semantic, or independence-friendly, quantifiers. All of these quantifiers are examples of
generalized quantifiers.
138
14.28 quantifier free
Let L be a first order language. A formula ψ is quantifier free iff it contains no quantifiers.
Let T be a complete L-theory. Let S ⊆ L. Then S is an elimination set for T iff for every
ψ(x̄) ∈ L there is some φ(x̄) ∈ S so that T ` ∀x̄(ψ(x̄) ↔ φ(x̄).
In particular, T has quantifier elimination iff the set of quantifier free formulas is an elimi-
nation set for T . In other words T has quantifier elimination iff for every ψ(x̄) ∈ L there is
some quantifier free φ(x̄) ∈ L so that T ` ∀x̄(ψ(x̄)) ↔ φ(x̄).
14.29 subformula
Let L be a first order language and suppose ϕ, ψ ∈ L are formulas. Then we say that ϕ is a
subformula of ψ if and only if :
1. ψ 6= ϕ
2. ψ is one of ¬α, ∀x(α) or ∃x(α), and either ϕ = α, or is a subformula of α.
3. ψ is α ∨ β or α ∧ β and either ϕ = α, ϕ = β, or ϕ is a subformula of α or β
Suppose Φ(α) is a property defined for every ordinal α, the principle of transfinite induc-
tion states that in the case where for every α, if the fact that Φ(β) is true for every β < α
implies that Φ(α) is true, then Φ(α) is true for every ordinal α. Formally :
139
∀α(∀β(β < α ⇒ Φ(β)) ⇒ Φ(α)) ⇒ ∀α(Φ(α))
The principle of transfinite induction is very similar to the principle of finite induction, ex-
cept that it is stated in terms of the whole class of the ordinals.
If Φ is a class of n-ary relations with ~x as the only free variables, an n + 1-ary formula ψ is
universal for Φ if for any φ ∈ Φ there is some e such that ψ(e, ~x) ↔ φ(~x). In other words,
ψ can simulate any element of Φ.
Similarly, if Φ is a class of function of ~x, a formula ψ is universal for Φ if for any φ ∈ Φ there
is some e such that ψ(e, ~x) = φ(~x).
Let L ∈ {Σn , ∆n , Πn } and take any k ∈ N. Then there is a k + 1-ary relation U ∈ L such
that U is universal for the k-ary relations in L.
Proof
First we prove the case where L = ∆1 , the recursive relations. We use the example of a
Gödel numbering.
• e = pφq
Since deductions are ∆1 , it follows that T is ∆1 . Then define U 0 (e, ~x) to be the least a such
that T (e, ~x, a) and U(e, ~x) ↔ (U 0 (e, ~x))len(U 0 (e,~x)) = e. This is again ∆1 since the ∆1 functions
are closed under minimization.
140
If f is any k − ary ∆1 function then f (~x) = U(pf q, ~x).
Now take L to be the k-ary relatons in either Σn or Πn . Call the universal relation for
k + n-ary ∆1 relations U∆ . Then any φ ∈ L is equivalent to a relation in the form
Qy1 Q0 y2 · · · Q∗ yn ψ(~x, ~y ) where g ∈ ∆1 , and so U(~x) = Qy1 Q0 y2 · · · Q∗ yn U∆ (pψq, ~x, ~y ). Then
U is universal for L.
Finally, if L is the k-ary ∆n relations and φ ∈ L then φ is equivalent to relations of the form
∃y1 ∀y2 · · · Qyn ψ(~x, ~y ) and ∀z1 ∃z2 · · · Qzn η(~x, ~z). If the k-ary universal relations for Σn and
Πn are UΣ and UΠ respectively then φ(~x) ↔ UΣ (pψq, ~x) ∧ UΠ (pηq, ~x).
2. for every a, if for every x such that xRa, we have Φ(x), then we have Φ(a)
As an example of application of this principle, we mention the proof of the fundamental theorem of arithmet
: every natural number has a unique factorization into prime numbers. The proof goes by
well-founded induction in the set N ordered by division.
141
14.35 well-founded induction on formulas
Let L be a first-order language. The formulas of L are built by a finite application of the
rules of construction. This says that the relation 6 defined on formulas by ϕ 6 ψ if and only
if ϕ is a subformula of ψ is a well-founded relation. Therefore, we can formulate a principle
of induction for formulas as follows : suppose P is a property defined on formulas, then P
is true for every formula of L if and only if
2. for every formula ϕ, if P is true for every subformula of ϕ, then P is true for ϕ.
142
Chapter 15
Härtig’s quantifier is a quantifier which takes two variables and two formulas, written
Ixyφ(x)ψ(y). It asserts that |{x | φ(x)}| = |{y | ψ(y)|. That is, the cardinality of the values
of x which make φ is the same as the cardinality of the values which make ψ(x) true. Viewed
as a generalized quantifier, I is a h2i quantifier.
Closely related is the Rescher quantifier, which also takes two variables and two formulas,
is written Jxyφ(x)ψ(y), and asserts that |{x | φ(x)}| 6 |{y | ψ(y)|. The Rescher quantifier is
sometimes defined instead to be a similar but different quantifier, Jxφ(x) ↔ |{x | φ(x)}| >
|{x | ¬φ(x)}|. The first definition is a h2i quantifier while the second is a h1i quantifier.
After the discovery of the paradoxes of set theory (notably Russell’s paradox), it become
apparent that naive set theory must be replaced by something in which the paradoxes can’t
arise. Two solutions were proposed: type theory and axiomatic set theory based on a
limitation of size principle (see the entries class and von Neumann-Bernays-Gödel set theory).
143
Type theory is based on the idea that impredicative definitions are the root of all evil.
Bertrand Russell and various other logicians in the beginning of the 20th century proposed
an analysis of the paradoxes that singled out so called vicious circles as the culprits. A
vicious circle arises when one attempts to define a class by quantifying over a totality of
classes including the class being defined. For example, Russell’s class R = {x | x 6∈ x}
contains a variable x that ranges over all classes.
Russell’s type theory, which is found in its mature form in the momentous Principia Mathe-
matica avoids the paradoxes by two devices. First, Frege’s fifth axiom is abandoned entirely:
the extensions of predicates do not appear among the objects. Secondly, the predicates
themselves are ordered into a ramified hierarchy so that the predicates at the lowest level
can be defined by speaking of objects only, the predicates at the next level by speaking of
objects and of predicates at the previous level and so forth.
The first of these principles has drastic implications to mathematics. For example, the
predicate “has the same cardinality” seemingly can’t be defined at all. For predicates apply
only to objects, and not to other predicates. In Frege’s system this is easy to overcome: the
equicardinality predicate is defined for extensions of predicates, which are objects. In order
to overcome this, Russell introduced the notion of types (which are today known as degrees).
Predicates of degree 1 apply only to objects, predicates of degree 2 apply to predicates of
degree 1, and so forth.
Type theoretic universe may seem quite odd to someone familiar with the cumulative hier-
archy of set theory. For example, the empty set appears anew in all degrees, as do various
other familiar structures, such as the natural numbers. Because of this, it is common to
indicate only the relative differences in degrees when writing down a formula of type theory,
instead of the absolute degrees. Thus instead of writing
one writes
to indicate that the formula holds for any i. Another possibility is simply to drop the
subscripts indicating degree and let the degrees be determined implicitly (this can usually
be done since we know that x ∈ y implies that if y is of degree n, then x is of degree n + 1).
A formula for which there is an assignment of types (degrees) to the variables and constants
so that it accords to the restrictions of type theory is said to be stratified.
The second device implies another dimension in which the predicates are ordered. In any
given degree, there appears a hierarchy of levels. At first level of degree n + 1 one has
predicates that apply to elements of degree n and which can be defined with reference only
to predicates of degree n. At second level there appear all the predicates that can be defined
144
with reference to preidcates of degree n and to predicates of degree n + 1 of level 1, and so
forth.
This second principle makes virtually all mathematics break down. For example, when
speaking of real number system and its completeness, one wishes to quantify over all pred-
icates of real numbers (this is possible at degree n + 1 if the predicates of real numbers
appear at degree n), not only of those of a given level. In order to overcome this, Russell
and Whitehead introduced in PM the so-called axiom of reducibility, which states that if a
predicate Pn occurs at some level k (i.e. Pn = Pnk ), it occurs already on the first level.
Frank P. Ramsay was the first to notice that the axiom of reducibility in effect collapses the
hierarchy of levels, so that the hierarchy is entirely superfluous in presense of the axiom. The
original form of type theory is known as ramified type theory, and the simpler alternative
with no second hierarchy of levels is known as unramified type theory or simply as simple
type theory.
One descendant of type theory is W. v. Quine’s system of set theory known as NF (New
Foundations), which differs considerably from the more familiar set theories (ZFC, NBG,
Morse-Kelley). In NF there is a class comprehension axiom saying that to any stratified
formula there corresponds a set of elements satisfying the formula. The Russell class is not
a set, since it contains the formula x ∈ x, which can’t be stratified, but the universal class
is a set: x = x is perfectly legal in type theory, as we can assign to x any degree and get a
well-formed formula of type theory. It is not known if NF axiomatises any extensor (see the
entry class) based on a limitation of size principle, like the more familiar set theories do.
In the modern variants of type theory, one usually has a more general supply of types.
Beginning with some set τ of types (presumably a division of the simple objects into some
natural categories), one defines the set of types T by setting
• if a, b ∈ T , then (a → b) ∈ T
• for all t ∈ τ , t ∈ T
One way to proceed to get something familiar is to have τ contain a type t for truth values.
Then sentences are objects of type t, open formulae of one variable are of type Object → t
and so forth. This sort of type system is often found in the study of typed lambda calculus
and also in intensional logics, which are often based on the former.
145
arithmetical hierarchy, the relations in each level are exactly the relations defined by the
formulas of that level.
The first level can be called ∆10 , ∆11 , Σ10 , or Π10 , and consists of the arithmetical formulas or
relations.
This quantifier, inexpressible in ordinary first order logic, can best be understood by its
Skolemization. The formula above is equivalent to ∀x∀aφ(x, f (y), a, g(a)). Critically, the
selection of y depends only on x while the selection of b depends only on a.
Logics with this quantifier are stronger than first order logic, lying between first and second order logic
in strength. For instance the Henkin quantifier can be used to define the Rescher quantifier,
and by extension Härtig’s quantifer:
∀x∃y
[(x = a ↔ y = b) ∧ φ(x) → ψ(y)] ↔ Rxyφ(x)ψ(y)
∀a∃b
To see that this is true, observe that this essentially requires that the Skolem functions
f (x) = y and g(a) = b the same, and moreover that they are injective. Then for each x
satisfying φ(x), there is a different f (x) satisfying ψ((f (x)).
146
This concept can be generalized to the game-theoretical quantifiers. This concept comes
from interpreting a formula as a game between a ”Prover” and ”Refuter.” A theorem is
provable whenever the Prover has a winning strategy; at each ∧ the Refuter chooses which
side they will play (so the Prover must be prepared to win on either) while each ∨ is a choice
for the Prover. At a ¬, the players switch roles. Then ∀ represents a choice for the Refuter
and ∃ for the Prover.
Classical first order logic, then, adds the requirement that the games have perfect information.
The game-theoretical quantifers remove this requirement, so for instance the Henkin quan-
tifier, which would be written ∀x∃y∀a∃/∀x bφ(x, y, a, b) states that when the Prover makes a
choice for b, it is made without knowledge of what was chosen at x.
In its most general form, a logical language is a set of rules for constructing formulas for
some logic, which can then by assigned truth values based on the rules of that logic.
• A set V of variables
Every function symbol, relation symbol, and connective is associated with an arity (the set
of n-ary function symbols is denoted Fn , and similarly for relation symbols and connectives).
Each quantifier is a generalized quantifier associated with a quantifier type hn1 , . . . , nn i.
The
S underlying logic has a (possibly empty) set of types T . There is a function Type :
F V → T which assignes S a type to each function and variable. For each arity n is a
function Inputsn : Fn Rn → T n which gives the types of each of the arguments to a
function symbol or relation. In addition, for each quantifier type hn1 , . . . , nn i there is a
function Inputshn1 ,...,nn i defined on Qhn1 ,...,nn i (the set of quantifiers of that type) which gives
an n-tuple of ni -tuples of types of the arguments taken by formulas the quantifier applies to.
147
1. If v is a variable such that Type(v) = t then v is a term of type t
2. If f is an n-ary function symbol such that Type(f ) = t and t1 , . . . , tn are terms such
that for each i < n Type(ti ) = (Inputsn (f ))i then f t1 , . . . , tn is a term of type t
1. If r is an n-ary relation symbol and t1 , . . . , tn are terms such that Type(ti ) = (Inputsn (r))i
then rt1 , . . . , tn is a formula
2. If c is an n-ary connective and f1 , . . . , fn are formulas then cf1 , . . . , fn is a formula
3. If q is a quantifier of type hn1 , . . . , nn i, v1,1 , . . . , v1,n1 , v2,1 , . . . , vn,1 , . . . , vn,nn are a sequence
of variables such that Type(vi,j ) = ((Inputshn1 ,...,nn i (q))j )i and f1 , . . . , fn are formulas
then qv1,1 , . . . , v1,n1 , v2,1 , . . . , vn,1 , . . . , vn,nn f1 , . . . , fn is a formula
Generally the connectives, quantifiers, and variables are specified by the appropriate logic,
while the function and relation symbols are specified for particular languages. Note that
0-ary functions are usually called constants.
If there is only one type which is equated directly with truth values then this is essentially
a propositional logic. If the standard quantifiers and connectives are used, there is only one
type, and one of the relations is = (with its usual semantics), this produces first order logic.
If the standard quantifiers and connectives are used, there are two types, and the relations
include = and ∈ with appropriate semantics, this is second order logic (a slightly different
formulation replaces ∈ with a 2-ary function which represents function application; this views
second order objects as functions rather than sets).
Note that often connectives are written with infix notation with parentheses used to control
order of operations.
Second order logic refers to logics with two (or three) types where one type consists of the
objects of interest and the second is either sets of those objects or functions on those objects
(or both, in the three type case). For instance, second order arithmetic has two types: the
numbers and the sets of numbers.
• the standard quantifiers (four of them, since each type needs its own universal and
existential quantifiers)
148
• the standard connectives
• if the second type represents sets, a relation ∈ where the first argument is of the first
type and the second argument is the second type
• if the second type represents functions, a binary function which takes one argument of
each type and results in an object of the first type, representing function application
Specific second order logics may deviate from this definition slightly. In particular, some
mathematicians have argued that first order logics which additional quantifiers which give
it most or all of the strength of second order logic should be considered second order logics.
Some people, chiefly Quine, have raised philisophical objections to second order logic, cen-
tering on the question of whether models require fixing some set of sets or functions as the
“actual” sets or functions for the purposes of that model.
149
Chapter 16
For example, in Haskell, a function that returns a particular Church integer might be
unchurch n = n (+1) 0
Thus the (+1) function would be applied to an initial value of 0 n times, yielding the ordinary
integer n.
Combinatory logic was invented by Moses Schönfinkel in the early 1920s, and was mostly
developed by Haskell Curry. The idea was to reduce the notation of logic to the simplest
terms possible. As such, combinatory logic consists only of combinators, combination
operations, and no free variables.
150
A combinator is simply a function with no free variables. A free variable is any variable
referred to in a function that is not a parameter of that function. The operation of com-
bination is then simply the application of a combinator to its parameters. Combination is
specified by simple juxtaposition of two terms, and is left-associative. Parentheses may also
be present to override associativity. For example
All combinators in combinatory logic can be derived from two basic combinators, S and K.
They are defined as
Sf gx = f x(gx)
Kxy = x
Reference is sometimes made to a third basic combinator, I, which can be defined in terms
of S and K.
Ix = SKKx = x
Combinatory logic and lambda calculus are equivalent. However, lambda calculus is more
concise than combinatory logic; an expression of size O(n) in lambda calculus is equivalent
to an expression of size O(n2 ) in combinatory logic.
Lambda calculus (often referred to as λ-calculus) was invented in the 1930s by Alonzo
Church, as a form of mathematical logic dealing primarly with functions and the application
of functions to their arguments. In pure lambda calculus, there are no constants. In-
stead, there are only lambda abstractions (which are simply specifications of functions),
variables, and applications of functions to functions. For instance, Church integers are used
as a substitute for actual constants representing integers.
151
A lambda abstraction is typically specified using a lambda expression, which might look
like the following.
λx.f x
The above specifies a function of one argument, that can be reduced by applying the
function f to its argument (function application is left-associative by default, and parentheses
can be used to specify associativity).
The λ-calculus is equivalent to combinatory logic (though much more concise). Most functional
programming languages are also equivalent to λ-calculus, to a degree (any imperative fea-
tures in such languages are, of course, not equivalent).
Examples
3 = λf x . f (f (f x))
Suppose we have a function inc, which when given a string representing an integer, returns
a new string representing the number following that integer. Then
add = λ x y . (λ f z . x f (y f z))
add 2 3 = λ f z . 2 f (3 f z)
= λf z . 2 f (f (f (f z)))
= λf z . f (f (f (f (f z))))
= 5
Multiplication is
mul = λ x y . (λ f z . x (λ w . y f w) z)
mul 2 3 = λ f z . 2 (λ w . 3 f w) z
= λ f z . 2 (λ w . f (f (f w))) z
= λ f z . f (f (f (f (f (f z)))))
152
Russell’s Paradox in λ-calculus
The λ-calculus readily admits Russell’s paradox. Let us define a function r that takes a
function x as an argument, and is reduced to the application of the logical function not to
the application of x to itself.
r = λ x . not (x x)
r r = not (r r)
= not (not (r r))
..
.
153
Chapter 17
Let (Ω, B, µ) be a probability space, andTlet X and Y be random variables on Ω with joint
probability distribution µ(X, Y ) := µ(X Y ).
In general,
µ(X|Y )µ(Y ) = µ(X, Y ) = µ(Y |X)µ(X), (17.1.2)
and so we have
µ(Y |X)µ(X)
µ(X|Y ) = . (17.1.3)
µ(Y )
154
Chapter 18
03B99 – Miscellaneous
A logic is said to have the Beth property if whenever a predicate R is implicitly definable by φ
(i.e. if all models have at most one unique extension satisfying φ), then R is explicitly defin-
able relative to φ (i.e. there is a ψ not containing R,such that φ |= ∀x1 , .., xn (R(x1 , ..., xn ) ↔
ψ(x1 , ..., xn ))).
The alphabet of the system contains three symbols M, I, U. The set of theorem is the set of
string constructed by the rules and the axiom, is denoted by T and can be built as follows:
(axiom) MI ∈ T.
example:
155
• Show that MUII ∈ T
MI ∈ T by axiom
→ MII ∈ T by rule (ii) where x = I
→ MIIII ∈ T by rule (ii) where x = II
→ MIIIIIIII ∈ T by rule (ii) where x = IIII
→ MIIIIIIIIU ∈ T by rule (i) where x = MIIIIIII
→ MIIIIIUU ∈ T by rule (iii)
→ MIIIII ∈ T by rule (iv)
→ MUII ∈ T by rule (iii)
• Is MU a theorem?
No. Why? Because the number of I’s of a theorem is never a multiple of 3. We will
show this by structural induction.
base case: The statement is true for the base case. Since the axiom has one I .
Therefore not a multiple of 3.
induction hypothesis: Suppose true for premise of all rule.
induction step: By induction hypothesis we assume the premise of each rule to be true
and show that the application of the rule keeps the staement true.
Rule 1: Applying rule 1 does not add any I’s to the formula. Therefore the statement
is true for rule 1 by induction hypothesis.
Rule 2: Applying rule 2 doubles the amount of I’s of the formula but since the initial
amount of I’s was not a multiple of 3 by induction hypothesis. Doubling that amount
does not make it a multiple of 3 (i.e. if n 6⇔ 0 mod 3 then 2n 6⇔ 0 mod 3). Therefore
the statement is true for rule 2.
Rule 3: Applying rule 3 replaces III by U. Since the initial amount of I’s was not a
multiple of 3 by induction hypothesis. Removing III will not make the number of I’s
in the formula be a multiple of 3. Therefore the statement is true for rule 3.
Rule 4: Applying rule 4 removes UU and does not change the amount of I’s. Since
the initial amount of I’s was not a multiple of 3 by induction hypothesis. Therefore
the statement is true for rule 4.
[GVL]
REFERENCES
[HD] Hofstader, R. Douglas: Gödel, Escher, Bach: an Eternal Golden Braid. Basic Books, Inc.,
New York, 1979.
156
18.3 IF-logic
Independence Friendly logic (IF-logic) is an interesting conservative extension of classical first order logic
based on very natural ideas from game theoretical semantics developed by Jaakko Hintikka
and Gabriel Sandu among others. Although IF-logic is a conservative extension of first order
logic, it has a number of interesting properties, such as allowing truth-definitions and ad-
mitting a translation of all Σ11 sentences (second order sentences with an intial second order
existential quantifier followed by a first order sentence).
IF-logic can be characterised as the natural extension of first order logic when one allows
informational independence to occur in the game theoretical truth definition. To understand
this idea we need first to introduce the game theoretical definition of truth for classical first
order logic.
To each first order sentence φ we assign a game G(φ) with to players played on models of
the appropriate language. The two players are called verifier and falsisier (or nature). The
idea is that the verifier attempts to show that the sentence is true in the model, while the
falsifier attempts to show that it is false in the model. The game G(φ) is defined as follows.
We will use the convention that if p is a symbol that names a function, a predicate or an
object of the model M, then pM is that named entity.
• if P is an n-ary predicate and ti are names of elements of the model, then G(P (t1 , ..., tn ))
Mn
is a game in which the verifier immediatedly wins if (tM 1 , ..., t ) ∈ P M and otherwise
the falsifier immediatedly wins.
• the game G(φ1 ∨ φ2 ) begins with the choice φi from φ1 and φ2 (i = 1 or i = 2) by the
verifier and then proceeds as the game G(φi )
• the game G(φ1 ∧ φ2 ) is the same as G(φ1 ∨ φ2 ), except that the choice is made by the
falsifier
• the game G(∃xφ(x)) begins with the choice by verifier of a member of M which is
given a name a, and then proceeds as G(φ(a))
• the game G(∀xφ(x)) is the same as G(∃xφ(x)), except that the choice of a is made by
the falsifier
• the game G(¬φ) is the same as G(φ) with the roles of the falsifier and verifier exchanged
Truth of a sentence φ is defined as the existence of a winning strategy for verifier for the
game G(φ). Similarly, falsity of φ is defined as the existence of a winning strategy for the
falsifier for the game G(φ). (A strategy is a specification which determines for each move the
opponent does what the player should do. A winning strategy is a strategy which guarantees
victory no matter what strategy the opponent follows).
157
For classical first order logic, this definition is equivalent to the usual Tarskian definition of
turth (i.e. the one based on satisfaction found in most treatments of semantics of first order
logic). This means also that since the law of excluded middle holds for first order logic that
the games G(φ) have a very strong property; either the falsifier or the verifier has a winning
strategy.
Notice that all rules except those for negation and atomic sentences concern choosing a
sentence or finding an element. These can be codified into functions, which tell us which
sentence to pick or which element of the model to choose, based on our previous choices
and those of our opponent. For example, consider the sentence ∀x(P (x) ∨ Q(x)). The
corresponding game begins with the falsifier picking an element a from the model, so a
strategy for the verifier must specify for each element a which of Q(a) and P (a) to pick. The
truth of the sentence is equivalent to the existence of a winning strategy for the verifier, i.e.
just such a function. But this means that ∀x(P (x)∨Q(x)) is equivalent to ∃f ∀xP (x)∧f (x) =
0∨Q(x)∧f (x) = 1. Let’s consider a more complicated example: ∀x∃y∀z∃s¬P (x, y, z, s). The
truth of this is equivalent to the existence of a functions f and g, s.t. ∀x∀zP (x, f (x), z, g(z)).
These sort of functions are known as Skolem functions, and they are in essence just winning
strategies for the verifier. We won’t prove it here, but all first order sentences can be
expressed in form ∃f1 ...∃fn ∀x1 ...∀xk φ, where φ is a truth functional combination of atomic
sentences in which all terms are either constants or variables xi or formed by application of
the functions fi to such terms. Such sentences are said to be in Σ11 form.
Let’s consider a Σ11 sentence ∃f ∃g∀x∀zφ(x, f (x), y, g(z)). Up front, it seems to assert the
existence of a winning strategy in a simple semantical game like those described above.
However, the game can’t correspond to any (classical) first order formula! Let’s first see
what the game the existence of a winning strategy of which this formula asserts looks like.
First, the falsifier chooses elements a and b to serve as x and y. Then the verifier chooses
an element c knowing only a and an element d knowing only b. The verifier’s goal is that
φ(a, c, b, d) comes out as a true atomic sentence. The game could be actually arranged so
that the verifier is a team of two players (who aren’t allowed to communicate with each
other), one of which picks c the other one picking d.
From a game theoretical point of view, games in which some moves must be made without
depending on some of the earlier moves are called informationally incomplete games, and
they occur very commonly. Bridge is such a game, for example, and usually real examples
of such games have “players” being actually teams made up of several people.
IF-logic comes out of the game theoretical definition in a natural way if we allow informa-
tional independence in our semantical games. In IF-logic, every connective can be augmented
with an independence marker //, so that ∗//∗0 means that the game for the occurance of
∗0 within the scope of ∗ must be played without knowledge of the choices made for ∗. For
example (∀x//∃y)∃yφ(x, y) asserts that for any choice of value for x by the falsisier, the
verifier can find a value for y which does not depend on the value of x, s.t. φ(x, y) comes
out true. This is not a very characteristic example, as it can be written as an ordinary first
order formula ∃y∀xφ(x, y). The curious game we described above corresponding to the sec-
158
ond order Skolem-function formulation Σ11 sentence ∃f ∃g∀x∀zφ(x, f (x), y, g(z)) corresponds
to an IF-sentence (∀x//∃y)(∀z//∃u)(∃y)(∃u)φ(x, y, z, u). IF-logic allows informational in-
dependence also for the usual logical connectives, for example (∀x//∨)(φ(x) ∨ ψ(x)) is true
if and only if for all x, either φ(x) or ψ(x) is true, but which of these is picked by the verifier
must be decided independently of the choice for x by the falsifier.
One of the striking characteristics of IF-logic is that every Σ11 formula φ has an IF-translation
φI F which is true if and only if φ is true (the equivalence does not in general hold if we
replace ’true’ with ’false’). Since for example first order truth (in a model) is Σ11 definable
(it’s just quantification over all possible valuations, which are second order objects), there
are IF-theories which correctly represent the truth predicate for their first order part. What
is even more striking is that sufficiently strong IF-theories can do this for the whole of the
language they are expressed in.
This seems to contradict Tarski’s famous result on the undefinability of truth, but this is
illusory. Tarski’s result depends on the assumption that the logic is closed under contradic-
tory negation. This is not the case for IF-logic. In general for a given sentence φ there is no
sentence φ∗ which is true just in case φ is not true. Thus the law of excluded middle does
not hold in general in IF-logic (although it does for the classical first order portion). This is
quite unsurprising since games of imperfect information are very seldom determined in the
sense that either the verifier or the falsisifer has a winning strategy. For example, a game in
which I choose a 10-letter word and you have one go at guessing it is not determined in this
sense, since there is no 10-letter word you couldn’t guess and on the other hand you have
no way of forcing me to choose any particular 10-letter word (which would guarantee your
victory).
IF-logic is stronger than first order logic in the usual sense that there are classes of structures
which are IF-definable but not first-order definable. Some of these are even finite. Many
interesting concepts are expressible in IF-logic, such as equicardinality,infinity (which can
be expressed by a logical formula in contradistinction to ordinary first order logic in which
non-logical symbols are needed), well-order
By Lindström’s theorem we thus know that either IF-logic is not complete (i.e. it’s set of
validities is not r.e.) or the Löwenheim-Skolem theorem does not hold. In fact, (downward)
Löwenheim-Skolem theorem does hold for IF-logic, so it’s not complete. There is a com-
plete disproof procedure for IF-logic, but because IF-logic is not closed under contradictory
negation this does not yield a complete proof procedure.
IF-logic can be extended by allowing contradictory negations of closed sentences and turth
functional combinations thereof. This extended IF-logic is extremely strong. For example,
the second order induction axiom for PA is ∀X((X(0) ∧ ∀y(X(y) → X(y + 1))) → ∀yX(y)).
The negation of this is a Σ11 sentence asserting the existence of a set which invalidates the
induction axiom. Since Σ11 sentences are expressible in IF-logic, we can translate the negation
of the induction axiom into IF-sentence φ. But now ¬φ is a formula of extended IF-logic,
and is clearly equivalent to the usual induction axiom! As all the rest of PA axioms are first
order, this shows that extended IF-logic PA can correctly define the natural number system.
159
There exists also an interesting “translation” of nth order logic into extended IF-logic. Con-
sider an n-sorted first order language and an nth order theory T translated into this language.
Now, extend the language to second order and add the axiom stating that the sort k + 1
actually comprises the whole of the powerset of the sort k. This is a Π1 sentence (i.e. of
the form “for all predicates P there is a first order element of sort k + 1 which comprises
exactly the k extension of P ”). It is easy to see that a formula is valid in this new system if
and only if it was valid in the original nth order logic. The negation of this axiom is again
Σ11 and translatable into IF-logic and thus the axiom itself is expressible in extended IF-
logic. Moreover, since most interesting second order theories are finitely axiomatisable, we
can consider sentences of form T ∗ → φ (where T ∗ is the multisorted translation of T ), which
express logical implication of φ by T (correctly). This is equivalent to ¬(T ∗) ∨ φ (where ¬
is contradictory), but since T ∗ is a conjunction of a P i11 sentence asserting comprehension
translated into extended IF-logic and first order translation of the axioms of T , this is a Σ11
formula translatable to non-extended IF-logic and so is φ. Thus sentences of form T → φ
of nth order logic are translatable into IF-sentences which are true just in case the originals
were.
Assume L is a logic which is closed under contradictory negation and has the usual truth-
functional connectives. Assume also that L has a notion of open formula with one variable
and of substitution. Assume that T is a theory of L in which we can define define surrogates
for formulae of L, and in which all true instances of the substitution relation and the truth-
functional connective relations are provable. We show that either T is inconsistent or T can’t
be augmented with a truth predicate True for which the following T-schema holds
True(0 φ0 ) ↔ φ
Assume that the open formulae with one variable of L have been indexed by some suitable
set that is representable in T (otherwise the predicate True would be next to useless, since if
there’s no way to speak of sentences of a logic, there’s little hope to define a truth-predicate
for it). Denote the i:th element in this indexing by Bi . Consider now the following open
formula with one variable
Now, since Liar is an open formula with one free variable it’s indexed by some i. Now
consider the sentence Liar(i). From the T-schema we know that
160
True(Liar(i)) ↔ Liar(i)
and by the definition of Liar and the fact that i is the index of Liar(x) we have
True(Liar(i)) ↔ ¬True(Liar(i))
which clearly is absurd. Thus there can’t be an extension of T with a predicate Truth for
which the T-schema holds.
We have made several assumptions on the logic L which are crucial in order for this proof
to go trough. The most important is that L is closed under contradictory negation. There
are logics which allow truth-predicates, but these are not usually closed under contradictory
negation (so that it’s possible that True(Liar(i)) is neither true nor false). These logics
usually have stronger notions of negation, so that a sentence ¬P says more than just that
P is not true, and the proposition that P is simply not true is not expressible.
An example of a logic for which Tarski’s undefinability result does not hold is the so-called
Independence Friendly logic, the semantics of which is based on game theory and which
allows various generalised quantifiers (the Henkin branching quantifier, &c.) to be used.
18.5 axiom
The logico-deductive method was developed by the ancient Greeks, and has become the core
principle of modern mathematics. However, the interpretation of mathematical knowledge
has changed from ancient times to the modern, and consequently the terms axiom and
postulate hold a slightly different meaning for the present day mathematician, then they
did for Aristotle and Euclid.
The ancient Greeks considered geometry as just one of several sciences, and held the theorems
of geometry on par with scientific facts. As such, they developed and used the logico-
deductive method as a means of avoiding error, and for structuring and communicating
knowledge. Aristotle’s Posterior Analytics is a definitive exposition of the classical view.
161
“Axiom”, in classical terminology, referred to a self-evident assumption common to many
branches of science. A good example would be the assertion that
At the foundation of the various sciences lay certain basic hypotheses that had to be accepted
without proof. Such a hypothesis was termed a postulate. The postulates of each science
were different. Their validity had to be established by means of real-world experience.
Indeed, Aristotle warns that the content of a science cannot be successfully communicated,
if the learner is in doubt about the truth of the postulates.
The classical approach is well illustrated by Euclid’s elements, where we see a list of axioms
(very basic, self-evident assertions) and postulates (common-sensical geometric facts drawn
from our experience).
A1 Things which are equal to the same thing are also equal to one another.
A4 Things which coincide with one another are equal to one another.
P1 It is possible to draw a straight line from any point to any other point.
P5 It is true that, if a straight line falling on two straight lines make the interior angles on
the same side less than two right angles, the two straight lines, if produced indefinitely,
meet on that side on which are the angles less than the two right angles.
A great lesson learned by mathematics in the last 150 years is that it is useful to strip the
meaning away from the mathematical assertions (axioms, postulates, propositions, theorems)
and definitions. This abstraction, one might even say formalization, makes mathematical
knowledge more general, capable of multiple different meanings, and therefore useful in
multiple contexts.
In structuralist mathematics we go even further, and develop theories and axioms (like
field theory, group theory, topology, vector spaces) without any particular application in
162
mind. The distinction between an “axiom” and a “postulate” disappears. The postulates
of Euclid are profitably motivated by saying that they lead to a great wealth of geometric
facts. The truth of these complicated facts rests on the acceptance of the basic hypotheses.
However by throwing out postulate 5, we get theories that have meaning in wider contexts,
hyperbolic geometry for example. We must simply be prepared to use labels like ”line”
and ”parallel” with greater flexibility. The development of hyperbolic geometry taught
mathematicians that postulates should be regarded as purely formal statements, and not as
facts based on experience.
When mathematicians employ the axioms of a field, the intentions are even more abstract.
The propositions of field theory do not concern any one particular application; the mathe-
matician now works in complete abstraction. There are many examples of fields; field theory
gives correct knowledge in all contexts.
It is not correct to say that the axioms of field theory are ”propositions that are regarded as
true without proof.” Rather, the Field Axioms are a set of constraints. If any given system of
addition and multiplication tolerates these constraints, then one is in a position to instantly
know a great deal of extra information about this system. There is a lot of bang for the
formalist buck.
Modern mathematics formalizes its foundations to such an extent that mathematical theories
can be regarded as mathematical objects, and logic itself can be regarded as a branch of
mathematics. Frege, Russell, Poincaré, Hilbert, and Gödel are some of the key figures in
this development.
In the modern understanding, a set of axioms is any collection of formally stated assertions
from which other formally stated assertions follow by the application of certain well-defined
rules. In this view, logic becomes just another formal system. A set of axioms should be
consistent; it should be impossible to derive a contradiction from the axiom. A set of axioms
should also be non-redundant; an assertion that can be deduced from other axioms need not
be regarded as an axiom.
It was the early hope of modern logicians that various branches of mathematics, perhaps
all of mathematics, could be derived from a consistent collection of basic axioms. An early
success of the formalist program was Hilbert’s formalization of Euclidean geometry, and the
related demonstration of the consistency of those axioms.
In a wider context, there was an attempt to base all of mathematics on Cantor’s set theory.
Here the emergence of Russell’s paradox, and similar antinomies of naive set theory raised
the possibility that any such system could turn out to be inconsistent.
The formalist project suffered a decisive setback, when in 1931 Gödel showed that it is
possible, for any sufficiently large set of axioms (Peano’s axioms, for example) to construct
a statement whose truth is independent of that set of axioms. As a corollary, Gödel proved
that the consistency of a theory like Peano arithmetic is an unprovable assertion within the
scope of that theory.
163
It is reasonable to believe in the consistency of Peano arithmetic because it is satisfied by
the system of natural numbers, an infinite but intuitively accessible formal system. How-
ever, at this date we have no way of demonstrating the consistency of modern set theory
(Zermelo-Frankel axioms). The axiom of choice, a key hypothesis of this theory, remains a
very controversial assumption. Furthermore, using techniques of forcing (Cohen) one can
show that the continuum hypothesis (Cantor) is independent of the Zermelo-Frankel axioms.
Thus, even this very general set of axioms cannot be regarded as the definitive foundation
for mathematics.
18.6 compactness
For example, first order logic is (ω, ω)-compact, for if all finite subsets of some class of
sentences are consistent, so is the class itself.
18.7 consistent
A slightly different definition is sometimes used, that T is consistent iff T 6` ⊥ (that is, as
long as it does not prove a contradiction). As long as the proof calculus used is sound and
complete, these two definitions are equivalent.
A logic is said to have the interpolation property if whenever φ(R, S) → ψ(R, S) holds, then
there is a sentence θ(R), so that φ(R, S) → θ(R) and θ(R) → ψ(R, T ), where R, S and T
164
are some sets of symbols that occur in the formulae, S being the set of symbols common to
both φ and ψ.
The interpolation property holds for first order logic. The interpolation property is related
to Beth definability property and Robinson’s consistency property. Also, a natural general-
isation is the concept ∆-closed logic.
18.9 sentence
∀x∃y[x < y]
or
∃z[z + 7 − 43 = 0]
x+2 =3
165
Chapter 19
The 3-dimensional ball can be split in a finite number of pieces which can be pasted together
to give two balls of the same volume as the first!
We then say that two sets A, B ⊂ Rn are equi-decomposable if both A and B are decom-
posable in the same pieces A1 , . . . , AN .
19.1.1 Comments
The actual number of pieces needed for this decomposition is not so large. Say that ten
pieces are enough.
Also it is not important that the set considered is a ball. Every two set with non empty
interior are equidecomposable in R3 . Also the ambient space can be choosen larger. The
theorem is true in all Rn with n ≥ 3 but it is not true in R2 nor in R.
Where is the paradox? We are saying that a piece of (say) gold can be cut and pasted to
obtain two pieces equal to the previous one. And we may divide these two pieces in the same
way to obtain four pieces and so on...
We believe that this is not possible since the weight of the piece of gold does not change
166
when I cut it.
A consequence of this theorem is, in fact, that it is not possible to define the volume for all
subsets of the 3-dimensional space. In particular the volume cannot be computed for some
of the pieces in which the unit ball is decomposed (some of them are not measurable).
The existence of non-measurable sets is proved more simply and in all dimension by Vitali Theorem.
However Banach-Tarski paradox says something more. It says that it is not possible to define
a measure on all the subsets of R3 even if we drop the countable additivity and replace it
with a finite additivity:
[
µ(A B) = µ(A) + µ(B) ∀A, B disjoint.
Another point to be noticed is that the proof needs the axiom of choice. So some of the
pieces in which the ball is divided are not constructable.
167
Chapter 20
20.1 congruence
2. For every natural number n and n-ary relation symbol R of Σ, if RA(a1 , . . . , an ) then
RA/∼ ([[a1 ]], . . . , [[an ]]), so RA/∼ (f (a1 ), . . . , f (an )).
168
3. For every natural number n and n-ary function symbol F of Σ,
20.4 kernel
169
20.6 quotient structure
4. For every natural number n and every n-ary relation symbol R of Σ, RA/∼ ([[a1 ]], . . . , [[an ]])
if and only if for some a0i ∼ ai we have RA(a01 , . . . , a0n ).
170
Chapter 21
The definition of a structure and of the satisfaction relation is nice, but it raises the following
question : how do we get models in the first place? The most basic construction for models
of first-order theory is the construction that uses constants. Throughout this entry, L is a
fixed first-order language.
Proof: Let Σ be the signature for L. If T is a consistent set of sentences of L, then there is
a maximal consistent T 0 ⊇ T . Note that T 0 and T have the same sets of witnesses. As every
model of T 0 is also a model of T , we may assume T is maximal consistent.
We let the universe of M be the set of equivalence classes C/ ∼, where a ∼ b if and only if
“a = b” ∈ T . As T is maximal consistent, this is an equivalence relation. We interpret the
non-logical symbols as follows :
171
2. Constant symbols are interpreted in the obvious way, i.e. if c ∈ Σ is a constant symbol,
then cM = [c];
3. If R ∈ Σ is an n-ary relation symbol, then ([a1 ], ..., [an ]) ∈ RM if and only if R(a1 , ..., an ) ∈
T;
4. If F ∈ Σ is an n-any function symbol, then F M([a0 ], ..., [an ]) = [b] if and only if
“F (a1 , ..., an ) = b” ∈ T .
From the fact that T is maximal consistent, and ∼ is an equivalence relation, we get that
the operations are well-defined (it is not so simple, i’ll write it out later). The proof that
M |= T is a straightforward induction on the complexity of the formulas of T . ♦
Proof: First add a set C of new constants to L, and expand T to T 0 in such a way that C
is a set of witnesses for T 0 . Then expand T 0 to a maximal consistent set T 00 . This set has a
model M consisting of the constants in C, and M is also a model ot T . ♦
Proof: Replace “has a model” by “is consistent”, and apply the syntactic compactness
theorem. ♦
I f T has a model, then it is consistent. The model constructed from constants has power
at most |L| (because we must add at most |L| many new constants). ♦
Most of the treatment found in this entry can be read in more details in Chang and Keisler’s
book Model Theory.
172
Let Sn (B) be the set of (complete) n-types over B (see type). Then we put a topology on
Sn (B) in the following manner.
For every formula ψ ∈ L(B) we let S(ψ) := {p ∈ Sn (B) : ψ ∈ p}. Then the topology is the
one with a basis of open sets given by {S(ψ) : ψ ∈ L(B)}. Then we call Sn (B) endowed
with this topology the Stone space of complete n-types over B.
Some logical theorems and conditions are equivalent to topological conditions on this topol-
ogy.
• The compactness theorem for first order logic is so named because it is equivalent to
this topology being compact.
• We define p to be an isolated type iff p is an isolated point in the stone space.SThis is
equivalent to there being some formula ψ so that for every φ ∈ p we have T ψ ` φ
i.e. all the formulas in p are implied by some formula.
• The Morley rank of a type p ∈ S1 (M) is equal to the Cantor-Bendixson rank of p in
this space.
The idea of considering the Stone space of types dates back to [1].
We can see that the set of formulas in a language is a Boolean lattice. A type is an ultra-filter
on this lattice. The definition of a Stone space can be made in an analogous way on the set
of ultra-filters on any boolean lattice.
REFERENCES
1. M. Morley, Categoricity in power. Trans. Amer. Math. Soc. 114 (1965), 514-538.
21.3 alphabet
An alphabet Σ is a nonempty finite set of symbols. The main restriction is that we must
make sure that every string formed from Σ can be broken back down into symbols in only
one way.
For example, {b, lo, g, bl, og} is not a valid alphabet because the string blog can be broken
up in two ways: b lo g and bl og. {Ca, n̈a, d, a} is a valid alphabet, because there is only
one way to fully break up any given string formed from it.
173
• Σ0 = λ, where λ stands for the empty string.
• Σn = {xy|x ∈ Σ, y ∈ Σn−1 } (xy is the juxtaposition of x and y)
Let T be a first order theory. A subset ∆ ⊆ T is a set of axioms for T if and only if T is
the set of all consequences of the formulas in ∆. In other words, ϕ ∈ T if and only if ϕ is
provable using only assumptions from ∆.
For example, group theory is finitely axiomatizable (it has only three axioms), and Peano arithmetic
is recursivaly axiomatizable : there is clearly an algorithm that can decide if a formula of
the language of the natural numbers is an axiom.
As an example of the use of this theorem, consider the theory of algebraically closed fields
of characteristic p for any number p prime or 0. It is complete, and the set of axioms is
obviously recursive, and so it is decidable.
21.5 definable
Then we write φ(M n , ~b) to denote {~a ∈ M n : M |= φ(~a, ~b)}. We say that φ(M n , ~b) is ~b-
definable. More generally if S is some set and B ⊆ M, and there is some ~b from B so that
174
S is ~b-definable then we say that S is B-definable.
In particular we say that a set S is ∅-definable or zero definable iff it is the solution set of
some formula without parameters.
Let f be a function, then we say f is B-definable iff the graph of f (i.e. {(x, y) : f (x) = y})
is a B-definable set.
Some authors use the term definable to mean what we have called ∅-definable here. If this
is the convention of a paper, then the term parameter definable will refer to sets that are
definable over some parameters.
Sometimes in model theory it is not actually very important what language one is using, but
merely what the definable sets are, or what the definability relation is.
(i) There is a formula in the language L s.t. f is definable over the model M, as in the above
definition; i.e., its graph is definable in the language L over the model M, by some formula
φ(~x, y).
(ii) The theory T proves that f is indeed a function, that is T ` ∀~x∃!y.φ(~x, y).
Let M be a first order structure. Let A and B be sets of parameters from M. Let p be
a complete n-type over B. Then we say that p is an A-definable type iff for every formula
ψ(x̄, ȳ) with ln(x̄) = n, there is some formula dψ(ȳ, z̄) and some parameters ā from A so
that for any b̄ from B we have ψ(x̄, b̄) ∈ p iff M |= dψ(b̄, ā).
Note that if p is a type over the model M then this condition is equivalent to showing that
175
{b̄ ∈ M : ψ(x̄, b̄) ∈ M} is an A-definable set.
If p is definable, we call dψ the defining formula for ψ, and the function ψ 7→ dψ a defining
scheme for p.
Let L be a first order language, let A be an L-structure and let K ⊆ dom(A). Then there is
an L-structure B such that K ⊆ B and |B| 6 Max(|K|, |L|) and B is elementarily embedded
in A.
Consider (Q, <) as a structure in a language with one binary relation, which we interpret as
the order. This is a universal, ℵ0 -categorical structure (see example of universal structure).
The theory of (Q, <) has quantifier elimination, and so is o-minimal. Thus a type over the
set Q is determined by the quantifier free formulas over Q, which in turn are determined by
the atomic formulas over Q. An atomic formula in one variable over B is of the form x < b
or x > b or x = b for some b ∈ B. Thus each 1-type over Q determines a Dedekind cut over
Q, and conversly a Dedekind cut determines a complete type over Q. Let D(p) := {a ∈ Q :
x > a ∈ p}.
1. Ones where D(p) is of the form (−∞, a) or (−∞, a] for some a ∈ Q. It is clear that
these are definable from the above discussion.
2. Ones where D(p) has no supremum in Q. These are clearly not definable by o-minimality
of Q.
176
21.9 example of strongly minimal
Let LR be the language of rings. In other words LR has two constant symbols 0, 1 and three
binary function symbols +, ., −. Let T be the LR -theory that includes the field axioms and
for each n the formula
^ X
∀x0 , x1 , . . . , xn ∃y(¬( xi = 0) → xi y i = 0)
16i6n 06i6n
Which expresses that every degree n polynomial which is non constant has a root. Then any
model of T is an algebraically closed field. One can show that this is a complete theory and
has quantifier elimination (Tarski). Thus every B-definable subset of any K |= T is definable
by a quantifier free formula in LR (B) with one free variable y. A quantifier
P free formula is a
Boolean combination of atomic formulas. Each of these is of the form i6n bi y i = 0 which
defines a finite set. Thus every definable subset of K is a finite or cofinite set. Thus K and
T are strongly minimal
then A/ker (f ) ∼
= i(f ).
S ince the homomorphic image of a Σ-structure is also a Σ-structure, we may assume that
i(f ) = B.
177
For every natural number n and n-ary function symbol F of Σ,
Thus φ is a bimorphism.
Now suppose f has the additional property mentioned in the statement of the theorem.
Then
Thus φ is an isomorphism.
21.11 language
Let Σ be an alphabet. We then define the following using the powers of an alphabet and
infinite union, where n ∈ Z.
∞
[
+
Σ = Σn
n=1
[∞ [
Σ∗ = Σn = Σ+ {λ}
n=0
A language over Σ is a subset of Σ∗ , meaning that it is a set of strings made from the
symbols in the alphabet Σ.
Take for example an alphabet Σ = {♣, ℘, 63, a, A}. We can construct languages over Σ, such
as: L = {aaa, λ, A℘63, 63♣, AaAaA}, or {℘a, ℘aa, ℘aaa, ℘aaaa, · · · }, or even the empty set
∅. In the context of languages, ∅ is called the empty language.
178
21.12 length of a string
For example, if our alphabet is Σ = {a, b, ca} then the length of the string w = bcaab is
kwk = 4, since the string breaks down as follows: x1 = b, x2 = ca, x3 = a, x4 = b. So, our
xn is x4 and therefore n = 4. Although you may think that ca is two separate symbols, our
chosen alphabet in fact classifies it as a single symbol.
A ”special case” occurs when kwk = 0, i.e. it does not have any symbols in it. This string
is called the empty string. Instead of saying w = , we use λ to represent the empty string:
w = λ. This is similar to the practice of using β to represent a space, even though a space
is really blank.
If your alphabet contains λ as a symbol, then you must use something else to denote the
empty string.
Suppose you also have a string v on the same alphabet as w. We turn w into x1 · · · xn just
as before, and similarly v = y1 · · · ym . We say v is equal to w if and only if both m = n,
and for every i, xi = yi .
For example, suppose w = bba and v = bab, both strings on alphabet Σ = {a, b}. These
strings are not equal because the second symbols do not match.
We need to show that i(f ) is closed under functions. For every constant symbol c of Σ,
cB = f (cA). Hence cB ∈ i(f ). Also, if b1 , . . . , bn ∈ i(f ) and F is an n-ary function symbol of
Σ, then for some a1 , . . . , an ∈ A we have
179
21.14 satisfaction relation
Alfred Tarski was the first mathematician to give a definition of what it means for a formula
to be “true” in a structure. To do this, we need to provide a meaning to terms, and truth-
values to the formulas. In doing this, free variables cause a problem : what value are they
going to have ? One possible answer is to supply temporary values for the free variables,
and define our notions in terms of these temporary values.
Now we are set to define satisfaction. Again we have to take care of free variables by assigning
temporary values to them via a function σ. We define the relation A, σ |= ϕ by induction
on the construction of formulas :
Here (
a if x = y
σ[x/a](y)
σ(y) else.
180
21.15 signature
A signature is the collection of a set of constant symbols, and for every natural number n,
a set of n-ary relation symbols and a set of n-ary function symbols.
Let L be a first order language and let M be an L-structure. Let S, a subset of the domain
of M be a definable infinite set. Then S is strongly minimal iff every definable C ⊆ S we
have either C is finite or S \ C is finite. We say that M is strongly minimal iff the domain
of M is a strongly minimal set.
Note that M is strongly minimal iff every definable subset of M is quantifier free definable
in a language with just equality. Compare this to the notion of o-minimal structures.
Let Σ be a fixed signature, and A and B be two structures for Σ. The interesting functions
from A to B are the ones that preserve the structure.
181
• An injective homomorphism is called a monomorphism.
21.18 structures
2. For each n ∈ N, and each n-ary function symbol f , J(f ) : An → A is a function from
An to A.
3. For each n ∈ N, and each n-ary relation symbol R, J(R) is a subset of (n-ary
relation on) An .
Another commonly used notation is J(c) = cA, J(R) = RA, J(f ) = f A. For notational
convenience, when the context makes it clear in which structure we are working, we use the
elements of Σ to stand for both the symbols and their interpretation. When Σ is understood,
we call A a structure, instead of a Σ-structure. In some texts, model may be used for
structure. Also, we shall write a ∈ A instead of a ∈ A. Of course, there are many different
possibilities for the interpretation J. If A is a structure, then the power of A, which we
denote |A|, is the cardinality of its universe A. It is easy to see that the number of possibilities
for the interpretation J is at most 2|A| when A is infinite.
182
21.19 substructure
21.20 type
Note that a complete n-type p over B is consistent so is in particular a partial type over
B. Also p is maximal in the sense that for every formula ψ(x, b̄) over B we have either
ψ(x, b̄) ∈ p or ¬ψ(x, b̄) ∈ p. In fact, for every collection of formulas p in n variables
the following are equivalent:
Some authors define a collection of formulas p to be a n-type iff p is a partial n-type. Others
define p to be a type iff p is a complete n-type.
A type (resp. partial type/complete type) is any n-type (resp. partial type/complete type)
for some n ∈ ω.
183
in B.
184
Chapter 22
Suppose we have some method M of generating sequences of letters from {p, q} so that at
each generation the probability of obtaining p is x, a real number strictly between 0 and 1.
Let {ai : i < ω} be a set of vertices. For each i < ω , i > 1 we construct a graph Gi on the
vertices a1 , . . . , ai recursively.
• For i > 1 we must describe for any j < k 6 i when aj and ak are joined.
Now let Γ be the graph on {ai : i < ω} so that for any n, m < ω, an is joined to am in Γ iff
it is in some Gi .
Then we call Γ a random graph. Consider the following property which we shall call f-
saturation:
S finite disjoint U and V , subsets of {ai : i < ω} there is some an ∈ {ai : i <
Given any
ω} \ (U V ) so that an is joined to every point of U and no points in V .
Proposition 1. A random graph has f-saturation with probability 1.
S
Proof: Let b1 , b2 , . . . , bn , . . . be an enumeration of {ai : i < ω} \ (U V ). We say that bi is
correctly joined to (U, V ) iff it is joined to all the members of U and non of the members of
V . Then the probability that bi is not correctly joined is (1 − x|U | (1 − x)|V | ) which is some
185
real number y strictly between 0 and 1. The probability that none of the first m are correctly
joined is y m and the probability that none of the bi s are correctly joined is limn→∞ y n = 0.
Thus one of the bi s is correctly joined.
Proof: This is via a back and fourth argument. The property of f-saturation is exactly what
is needed.
Thus although the system of generation of a random graph looked as though it could deliver
many potentially different graphs, this is not the case. Thus we talk about the random
graph.
The random graph can also be constructed as a Fraisse limit of all finite graphs, and in many
other ways. It is homogeneous and universal for the class of all countable graphs.
The theorem that almost every two infinite graph random are isomorphic was first proved
in [1].
REFERENCES
1. Paul Erdös and Alféd Rényi. Assymetric graphs. Acta Math. Acad. Sci. Hung., 14:295–315,
1963.
186
Chapter 23
23.1 κ-categorical
Let L be a first order language and let S be a set of L-sentences. If κ is a cardinal, then S
is said to be κ-categorical if S has a model of cardinality κ and any two such models are
isomorphic.
In other words, S is categorical iff it has a unique model of cardinality κ, to within isomorphism.
Let L be a first order language, and let S be a set of L-sentences with no finite models which
is κ-categorical for some κ > |L|. Then S is complete.
187
Similarly, if A ¬ϕ then S ¬ϕ. So S is complete.
188
Chapter 24
Let L be the first order language with the binary relation 6. Consider the following sentences:
• ∀x, y, z(x 6 y ∧ y 6 z → x 6 z)
Any L-structure satisfying these is called a linear order. We define the relation < so that
x < y iff x 6 y ∧ x 6= y. Now consider these sentences:
A linear order that satisfies 1. is called dense. We say that a linear order that satisfies 2. is
without endpoints. Let T be the theory of dense linear orders without endpoints. This is a
complete theory.
Theorem 3. Let (S, 6) be any finite linear order. Then S embeds in (Q, 6).
189
Suppose that the statement holds for all linear orders with cardinality less than or equal to
n. Let |S| = n + 1, then pick some a ∈ S, let S 0 be the structure induced by S on S \ a.
Then there is some embedding e of S 0 into Q.
• Now suppose a is less than every member of S 0 , then as Q is without endpoints, there
is some element b less than every element in the image of e. Thus we can extend e to
map a to b which is an embedding of S into Q.
• We work similarly if a is greater than every element in S 0 .
• If neither of the above hold then we can pick some maximum c1 ∈ S 0 so that c1 < a.
Similarly we can pick some minimum c2 ∈ S 0 so that c2 < a. Now there is some b ∈ Q
with e(c1 ) < b < e(c2 ). Then extending e by mapping a to b is the required embedding.
It is easy to extend the above result to countable structures. One views a countable structure
as a the union of an increasing chain of finite substructures. The necessary embedding is
the union of the embeddings of the substructures. Thus (Q, 6) is universal countable linear
order.
Proof: The following type of proof is known as a back and forth argument. Let S1 and S2 be
two finite substructures of (Q, 6). Let e : S1 → S2 be an isomorphism. It is easier to think
of two disjoint copies B and C of Q with S1 a substructure of B and S2 a substructure of C.
The ith forth step If bi is already in the domain of e then do nothing. If bi is not in the
domain of e. Then as in proposition 3, either bi is less than every element in the domain of
e or greater than or it has an immediate successor and predecessor in the range of e. Either
way there is an element c in C\ range(e) relative to the range of e. Thus we can extend the
isomorphism to include bi .
The ith back step If ci is already in the range of e then do nothing. If ci is not in the
domain of e. Then exactly as above we can find some b ∈ B\ dom(e) and extend e so that
e(b) = ci .
After ω stages, we have an isomorphism whose range includes every bi and whose domain
includes every ci . Thus we have an isomorphism from B to C extending e.
A similar back and forth argument shows that any countable dense linear order wihtout
endpoints is isomorphic to (Q, 6) so T is ℵ0 -categorical.
190
24.2 homogeneous
Let L be a first order language, and let R be an elementary class of L-structures. Let κ be
a cardinal. Rκ be the set of structures from R with cardinality less than or equal to κ.
191
Chapter 25
A class of L-structures S has the amalgamation property iff whenever A, B1 , B2 ∈ S and and
fi : A → Bi are elementary embeddings for i ∈ {1, 2} then there is some C ∈ S and some
elementary embeddings gi : Bi → C for i ∈ {1, 2} so that g1 (f1 (x)) = g2 (f2 (x)) for all x ∈ A.
Compare this with the free product with amalgamated subgroup for groups and the defini-
tion of pushout contained there.
192
Chapter 26
26.1 infinitesimal
Let R be a real closed field, for example the reals thought of as a structure in L, the language
of ordered rings. Let B be some set of parameters from R. Consider the following set of
formulas in L(B):
{x < b : b ∈ B ∧ b > 0}
Then this set of formulas is finitely satisfied, so by compactness is consistent. In fact this
set of formulas extends to a unique type p over B, as it defines a Dedekind cut. Thus there
is some model M containing B and some a ∈ M so that tp(a/B) = p.
Any such element will be called B-infinitesimal. In particular, suppose B = ∅. Then the
definable closure of B is the intersection of the reals with the algebraic numbers. Then a
∅-infinitesimal (or simply infinitesimal ) is any element of any real closed field that is positive
but smaller than every real algebraic (positive) number.
As noted above such models exist, by compactness. One can construct them using ultra-
products, see the entry on hyperreal. This is due to Abraham Robinson, who used such
fields to formulate nonstandard analysis.
Let K be any ordered ring, then K contains N. We say K is archemedian iff for every a ∈ K
there is some n ∈ N so that a < n. Otherwise K is non-archemedian.
Real closed fields with infinitesimal elements are non-archemedian: for a an infinitesimal we
have a < 1/n and thus 1/a > n for each n ∈ N.
Reference: A Robinson, Selected papers of Abraham Robinson. Vol. II. Nonstandard anal-
193
ysis and philosophy (New Haven, Conn., 1979)
26.2 o-minimality
Then we define M to be o-minimal iff every definable subset of M is a finite union of intervals
and points. This is a property of the theory of M i.e. if M ⇔ N and M is o-minimal, then
N is o-minimal. Note that M being o-minimal is equivalent to every definable subset of M
being quantifier free definable in the language with just the ordering. Compare this with
strong minimality.
The model theory of o-minimal structures is well understood, for an excellent account see
Lou van den Dries, Tame topology and o-minimal structures, CUP 1998. In particular,
although this condition is merely on definable subsets of M it gives very good information
about definable subsets of M n for n ∈ ω.
It is clear that the axioms for a structure to be an ordered field can written in L, the
first order language of ordered rings. It is also true that the following conditions can be
written in a schema of first order sentences in this language. For each odd degree polynomial
p ∈ K[x], p has a root.
Let A be all these sentences together with one that states that all positive elements have a
square root. Then one can show that the consequences of A are a complete theory T . It is
clear that this theory is the theory of the real numbers. We call any L structure a real closed
field.
194
The semi algebraic sets on a real closed field are Boolean combinations of solution sets of
polynomial equalities and inequalities. Tarski showed that T has quantifier elimination,
which is equivalent to the class of semi algebraic sets being closed under projection.
Let K be a real closed field. Consider the definable subsets of K. By quantifier elimination,
each is definable by a quantifier free formula, i.e. a boolean combination of atomic formulas.
An atomic formula in one variable has one of the following forms:
The first defines a finite union of intervals, the second defines a finite union of points. Every
definable subset of K is a finite union of these kinds of sets, so is a finite union of intervals
and points. Thus any real closed field is o-minimal.
195
Chapter 27
27.1 imaginaries
For an example take some infinite group (G, .). Consider the centre of G, Z := {x ∈ G :
∀y ∈ G(xy = yx)}. Then Z is a first order definable subset of G, which forms a group with
the restriction of the multiplication, so (Z, .) is a first order definable structure in (G, .).
As another example consider the structure (R, +, ., 0, 1) as a field. Then the structure (R, <)
is first order definable in the structure (R, +, ., 0, 1) as for all x, y ∈ R2 we have x 6 y iff
∃z(z 2 = y − x). Thus we know that (R, +, ., 0, 1) is unstable as it has a definable order on
an infinite subset.
Returning to the first example, Z is normal in G, so the set of (left) cosets of Z form a
factor group. The domain of the factor group is the quotient of G under the equivalence relation
x ⇔ y iff ∃z ∈ Z(xz = y). Therefore the factor group G/Z will not (in general) be a de-
finable structure, but would seem to be a “natural” structure. We therefore weaken our
formalisation of “natural” from definable to interpretable. Here we require that a struc-
ture is isomorphic to some definable structure on equivalence classes of definable equivalence
relations. The equivalence classes of a ∅-definable equivalence relation are called imaginaries.
196
In [2] Poizat defined the property of Elimination of Imaginaries. This is equivalent to the
following definition:
Definition 1. A structure A with at least two distinct ∅-definable elements admits elimina-
tion of imaginaries iff for every n ∈ N and ∅-definable equivalence relation ∼ on An there is
a ∅-definable function f : An → Ap (for some p) such that for all x and y from An we have
Given this property, we think of the function f as coding the equivalence classes of ∼,
and we call f (x) a code for x/ ∼. If a structure has elimination of imaginaries then every
interpretable structure is definable.
In [3] Shelah defined, for any structure A a multi-sorted structure Aeq . This is done by adding
a sort for every ∅-definable equivalence relation, so that the equivalence classes are elements
(and code themselves). This is a closure operator i.e. Aeq has elimination of imaginaries.
See [1] chapter 4 for a good presentation of imaginaries and Aeq . The idea of passing to
Aeq is very useful for many purposes. Unfortunately Aeq has an unwieldy language and
theory. Also this approach does not answer the question above. We would like to show that
our structure has elimination of imaginaries with just a small selection of sorts added, and
perhaps in a simple language. This would allow us to describe the definable structures more
easily, and as we have elimination of imaginaries this would also describe the interpretable
structures.
REFERENCES
1. Wilfrid Hodges, A shorter model theory Cambridge University Press, 1997.
2. Bruno Poizat, Une théorie de Galois imaginaire, Journal of Symbolic Logic, 48 (1983), pp.
1151-1170.
3. Saharon Shelah, Classification Theory and the Number of Non-isomorphic Models, North Hol-
lans, Amsterdam, 1978.
197
Chapter 28
A traditional model of a language makes every formula of that language either true or
false. A Boolean valued model is a generalization in which formulas take on any value in a
Boolean algebra.
Specifically, a Boolean valued model of a signature Σ over the language L is a set A together
with a Boolean algebra B. Then the objects of the model are the functions AB = B → A.
For any formula φ, we can assign a value kφk from the Boolean algebra. For example, if L is
the language of first order logic, a typical recursive definition of kφk might look something
like this:
W
• kf = gk = f (b)=g(b) b
• k¬φk = kφk0
• kφ ∨ ψk = kφk ∨ kψk
W
• k∃xφ(x)k = f ∈AB kφ(f )k
198
Chapter 29
03C99 – Miscellaneous
For any set X, there is no function f from ω to the transitive closure of X such that
f (n + 1) ∈ f (n).
For any formula φ, if there is any set x such that φ(x) then there is some X such that φ(X)
but there is no y ∈ X such that φ(y).
If M and N are models of L then they are elementarily equivalent, denoted M ⇔ N iff
for every sentence φ:
M φiffN φ
199
29.3 elementary embedding
29.4 model
Let L be a logical language with function symbols F , relations R, and types T . Then
M = h{Mt | t ∈ T }, {f M | f ∈ F }, {r M | r ∈ R}i
is a model of L (also called an L-structure, or, if the underlying logic is clear, a Σ-structure,
where Σ is a signature specifying just F and R) if:
• Whenever r isQan n-ary relation symbol such that Inputsn (r) = ht1 , . . . , tn i then r M is
a relation on n1 Mti
200
• truth of a non-atomic formula is defined using the semantics of the underlying logic.
For any term s of L whose only free variables are included in x1 , . . . , xn with types t1 , . . . , tn
then for any a1 , . . . , an such that ai ∈ Mti define sM(a1 , . . . , an ) by:
• If si = xi then sM
i (a1 , . . . , an ) = ai
We show that each of the three formulations of the axiom of foundation given are equivalent.
1⇒2
2⇒3
Let φ be some formula such that φ(x) is true and for every X such that φ(X), there is some
y ∈ X such that φ(X). The define f (0) = x and f (n + 1) is some n ∈ f (n) such that φ(x).
This would construct a function violating the assumption, so there is no such φ.
201
3⇒1
Let X be a nonempty set and define φ(x) ⇔ x ∈ X. Then φ is true for some X, and by
there is some y such that φ(y) but there is no z ∈ y such that φ(z). Hence
assumption, T
y ∈ X but y X = ∅.
202
Chapter 30
The ”physical description” of a Turing machine is a box with a tape and a tape head. The
tape consists of an infinite number of cells stretching in both directions, with the tape head
always located over exactly one of these cells. Each cell has one of a finite number of symbols
written on it.
The machine has a finite set of states, and with every move the machine can change states,
change the symbol written on the current cell, and move one space left or right. The machine
has a program which specifies each move based on the current state and the symbol under
the current cell. The machine stops when it reaches a combination of state and symbol
for which no move is defined. One state is the start state, which the machine is in at the
beginning of a computation.
A Turing machine may be viewed as computing either a partial function or a relation. When
viewed as a function, the tape begins with a set of symbols which are the input, and when
the machine halts, whatever is on the tape is the output. For instance it is not difficult to
write a program which doubles a binary number, so input of 10 (with 0 on the first cell, 1
on the second, and all the rest blank) would give output 100. If the machine does not halt
on a particular input then the function is undefined on that input.
Alternatively, a Turing machine may be viewed as computing a relation. In that case the
initial symbols on the tape is again an input, and some states are denoted ”accepting.” If
the machine halts in an accepting state, the symbol is accepted, if it halts in any other state,
203
the symbol is rejected. A slight variation is when all states are accepting, and a symbol
is rejected if the machine never halts (of course, if the only method of determining if the
machine will halt is watching it then you can never be sure that it won’t stop at some point
in the future).
Another way for a Turing machine to compute a relation is to list (enumerate) its members
one by one. A relation is recursively enumerable if there is some Turing machine which can
list it in this way, or equivalently if there is a machine which halts in an accepting state only
on the members of the relation. A relation is recursive if it is recursively enumerable and its
complement is also. An equivalent definition is that there is a Turing machine which halts
in an accepting state only on members of the relation and always halts.
There are many variations on the definition of a Turing machine. The tape could be infinite
in only one direction, having a first cell but no last cell. Even stricter, a tape could move in
only one direction. It could be two (or more) dimensional. There could be multiple tapes,
and some of them could be read only. The cells could have multiple tracks, so that they hold
multiple symbols simultaneously.
The programs mentioned above define only one move for each possible state and symbol
combination; these are called deterministic. Some programs define multiple moves for some
combinations.
If the machine halts whenever there is any series of legal moves which leads to a situation
without moves, the machine is called non-deterministic. The notion is that the machine
guesses which move to use whenever there are multiple choices, and always guesses right.
Yet other machines are probabilistic; when given the choice between different moves they
select one at random.
No matter which of these variations is used, the recursive and recursively enumerable relations
and functions are unchanged (with two exception–one of the tapes has to move in two di-
rections, although it need not be infinite in both directions, and there can only be a finite
number of symbols, states, and tapes): the simplest imagineable machine, with a single tape,
one-way infinite tape and only two symbols, is equivalent to the most elaborate imagineable
array of multidimensional tapes, lucky guesses, and fancy symbols.
However not all these machines can compute at the same speed; the speed-up theorem states
that the number of moves it takes a machine to halt can be divided by an arbitrary constant
(the basic method involves increasing the number of symbols so that each cell encodes several
cells from the original machine; each move of the new machine emulates several moves from
the old one).
?
In particular, the question P = NP, which asks whether an important class determinisitic
machines (those which have a polynomial function of the input length bounding the time it
takes them to halt) is the same as the corresponding class of non-deterministic machines, is
one of the major unsolved problems in modern mathematics.
204
Version: 2 Owner: Henry Author(s): Henry
205
Chapter 31
The class of primitive recursive functions is the smallest class of functions on the naturals
(from N to N) that
1. Includes
• the zero function: z(x) = 0
• the successor function: s(x) = x + 1
• the projection functions: pn,m (x1 , . . . , xn ) = xm , m ≤ n
2. Is closed under
• composition: h(x1 , . . . , xn ) = f (g1(x1 , . . . , xn ), . . . , gm (x1 , . . . , xn ))
• primitive recursion: h(x, 0) = f (x); h(x, y + 1) = g(x, y, h(x, y))
The primitive recursive functions are Turing-computable, but not all Turing-computable
functions are primitive recursive (see Ackermann’s function).
Further Reading
206
Chapter 32
• There exists a Turing machine f such that ∀x.(x ∈ L) ⇔ the computation f (x) terminates.
A language L fulfilling any (and therefore all) of the above conditions is called recursively
enumerable.
Examples
1. Any recursive language.
2. The set of encodings of Turing machines which halt when given no input.
4. The set of integers n for which the hailstone sequence starting at n reaches 1. (We
don’t know if this set is recursive, or even if it is N; but a trivial program shows it is
recursively enumerable.)
207
Chapter 33
A(0, y) =y+1
A(x + 1, 0) = A(x, 1)
A(x + 1, y + 1) = A(x, A(x + 1, y))
A(0, y) = y+1
A(1, y) = 2 + (y + 3) − 3
A(2, y) = 2 · (y + 3) − 3
A(3, y) = 2y+3 − 3
.·2
A(4, y) = 22 −3 (y + 3exponentiations)
···
... and at this point conventional notation breaks down, and we need to employ something
like Conway notation or Knuth notation for large numbers.
208
Ackermann’s function wasn’t actually written in this form by its namesake, Wilhelm Acker-
mann. Instead, Ackermann found that the z-fold exponentiation of x with y was an example
of a recursive function which was not primitive recursive. Later this was simplified by Rosza
Peter to a function of two variables, similar to the one given above.
The consequences of a solution to the halting problem are far-reaching. Consider some
predicate P (x) regarding natural numbers; suppose we conjecture that P (x) holds for all
x ∈ N. (Goldbach’s conjecture, for example, takes this form.) We can write a program
that will count up through the natural numbers and terminate upon finding some n such
that P (n) is false; if the conjecture holds in general, then our program will never terminate.
Then, without running the program, we could pass it along to a halting program to
prove or disprove the conjecture.
In 1936, Alan Turing proved that the halting problem is undecideable; the argument is
presented here informally. Consider a hypothetical program that decides the halting the
problem:
Algorithm Halt(P, I)
Input: A computer program P and some input I for P
Output: True if P halts on I and false otherwise
The implementation of the algorithm, as it turns out, is irrelevant. Now consider another
program:
Algorithm Break(x)
Input: An irrelevant parameter x
Output: begin
if Halt(Break, x) then
whiletrue do
nothing
else
Break ← true
end
In other words, we can design a program that will break any solution to the halting problem.
If our halting solution determines that Break halts, then it will immediately enter an infinite
loop; otherwise, Break will return immediately. We must conclude that the Halt program
does not decide the halting problem.
209
Version: 2 Owner: vampyr Author(s): vampyr
210
Chapter 34
Let κ be a limit ordinal (e.g. a cardinal). The cofinality of κ cf(κ) could also be defined as:
(sup U is calculated using the natural order of the ordinals). The cofinality of a cardinal is
always a regular cardinal and hence cf(κ) = cf(cf(κ)).
34.2 cofinality
If β is an ordinal, the cofinality cf(β) of β is the least ordinal α such that there is a cofinal
map f : α → β. Note that cf(β) 6 β, because the identity map on β is cofinal.
It is not hard to show that the cofinality of any ordinal is a cardinal, in fact a regular cardinal:
a cardinal κ is said to be regular if cf(κ) = κ and singular if cf(κ) < κ.
211
For any infinite cardinal κ it can be shown that κ < κcf(κ) , and so also κ < cf(2κ ).
Examples
0 and 1 are regular cardinals. All other finite cardinals have cofinality 1 and are therefore
singular.
ℵ0 is regular.
The smallest infinite singular cardinal is ℵω . In fact, the map f : ω → ℵω given by f (n) = ωn
is cofinal, so cf(ℵω ) = ℵ0 . Note that cf(2ℵ0 ) > ℵ0 , and consequently 2ℵ0 6= ℵω .
Let ≤ be an ordering on a set S, and let A ⊆ S. Then, with respect to the ordering ≤,
Examples.
• The natural numbers N ordered by divisibility (|) have a least element, 1. The natural
numbers greater than 1 (N \ 1) have no least element, but infinitely many minimal
elements (the primes.) In neither case is there a greatest or maximal element.
• The negative integers ordered by the standard definition of ≤ have a maximal element
which is also the greatest element, −1. They have no minimal or least element.
• The natural numbers N ordered by the standard ≤ have a least element, 1, which is
also a minimal element. They have no greatest or maximal element.
• The rationals greater than zero with the standard ordering ≤ have no least element or
minimal element, and no maximal or greatest element.
212
34.4 partitions less than cofinality
A well-ordered set is a totally ordered set in which every nonempty subset has a least
member.
An example of well-ordered set is the set of positive integers with the standard order relation
(Z+ , <), because any nonempty subset of it has least member. However, R+ (the positive
reals) is not a well-ordered set with the usual order, because (0, 1) = {x : 0 < x < 1} is a
nonempty subset but it doesn’t contain a least number.
For any natural number n, there does not exist a bijection between n and a proper subset
of n.
The name of the theorem is based upon the observation that pigeons will not occupy a
pigeonhole that already contains a pigeon, so there is no way to fit n pigeons in fewer than
n pigeonholes.
It will first be proven that, if a bijection exists between two finite sets, then the two sets
have the same number of elements.
213
Let S and T be finite sets and f : S → T be a bijection. Since f is injective, then |S| =
| ran f |. Since f is surjective, then |T | = | ran f |. Thus, |S| = |T |.
Since the pigeonhole principle is the contrapositive of the proven statement, it follows that
the pigeonhole principle holds.
In a set theory, a tree is defined to be a set T and a relation <T ⊆ T × T such that:
The nodes immediately greater than a node are termed its children, the node immediately
less is its parent (if it exists), any node less is an ancestor and any node greater is a
descendant. A node with no ancestors is a root.
The partial ordering represents distance from the root, and the well-ordering requirement
prohibits any loops or splits below a node (that is, each node has at most one parent, and
therefore at most one grand-parent, and so on). Since there is generally no requirement that
the tree be connected, the null ordering makes any set into a tree, although the tree is a
trivial one, since each element of the set forms a single node with no children.
Since the set of ancestors of any node is well-ordered, we can associate it with an ordinal.
We call this the height, and write: ht(t) = o. t.({s ∈ T | s <T t}). This all accords with
normal usage: a root has height 0, something immediately above the root has height 1, and
so on. We can then assign a height to the tree itself, which we define to be the least number
greater than the height of any element of the tree. For finite trees this is just one greater
than the height of its tallest element, but infinite trees may not have a tallest element, so
we define ht(T ) = sup{ht(t) + 1 | t ∈ T }.
For every α <T ht(T ) we define the α-th level to be the set Tα = {t ∈ T | ht(t) = α}. So
of course T0 is all roots of the tree. If α <T ht(T ) then T (α) is the subtree of elements with
height less than α: t ∈ T (α) ↔ x ∈ T ∧ ht(t) < α.
We call a tree a κ-tree for any cardinal κ if |T | = κ and ht T = κ. If κ is finite, the only way
to do this is to have a single branch of length κ.
214
34.9 κ-complete
Similarly, a partial order is κ-complete if any sequence of fewer than κ elements has an
upper bound within the partial order.
One of the starting points in Cantor’s development of set theory was his discovery that there
are different degrees of infinity. The rational numbers, for example, are countably infinite; it
is possible to enumerate all the rational numbers by means of an infinite list. By contrast,
the real numbers are uncountable. it is impossible to enumerate them by means of an
infinite list. These discoveries underlie the idea of cardinality, which is expressed by saying
that two sets have the same cardinality if there exists a bijective correspondence between
them.
In essence, Cantor discovered two theorems: first, that the set of real numbers has the same
cardinality as the power set of the naturals; and second, that a set and its power set have a
different cardinality (see Cantor’s theorem). The proof of the second result is based on the
celebrated diagonalization argument.
Cantor showed that for every given infinite sequence of real numbers x1 , x2 , x3 , . . . it is
possible to construct a real number x that is not on that list. Consequently, it is impossible
to enumerate the real numbers; they are uncountable. No generality is lost if we suppose
that all the numbers on the list are between 0 and 1. Certainly, if this subset of the real
numbers in uncountable, then the full set is uncountable as well.
where
xn = 0.dn1dn2 dn3 dn4 . . . ,
and the expansion avoids an infinite trailing string of the digit 9.
215
For each n = 1, 2, . . . we choose a digit cn that is different from dnn and not equal to 9, and
consider the real number x with decimal expansion
0.c1 c2 c3 . . .
By construction, this number x is different from every member of the given sequence. After
all, for every n, the number x differs from the number xn in the nth decimal digit. The claim
is proven.
The Schröder-Bernstein theorem is useful for proving many results about cardinality, since
it replaces one hard problem (finding a bijection between S and T ) with two generally easier
problems (finding two injections).
The Veblen function is used to obtain larger ordinal numbers than those provided by
exponentiation. It builds on a hierarchy of closed and unbounded classes:
• Cr(Sn) = Cr(n)0 the set of fixed points of the enumerating function of Cr(n)
T
• Cr(λ) = α<λ Cr(α)
216
The Veblen function ϕα β is defined by setting ϕα equal to the enumerating function of Cr(α).
We call a number α strongly critical if α ∈ Cr(α). The class of strongly critical ordinals
is written SC, and the enumerating function is written fSC (α) = Γα .
Γ0 , the first strongly critical ordinal, is also called the Feferman-Schutte ordinal.
Obviously 1 ∈ H, since 0 + 0 < 1. Also ω ∈ H since the sum of two finite numbers is still
finite, and no finite numbers other than 1 are in H.
A cardinal number is an ordinal number S with the property that S ⊂ X for every ordinal
number X which has the same cardinality as S.
The cardinal successor of a cardinal κ is the least cardinal greater than κ. It is denoted κ+ .
217
34.17 cardinality
Cardinality is a notion of the size of a set which does not rely on numbers. It is a relative
notion because, for instance, two sets may each have an infinite number of elements, but one
may have a greater cardinality. That is, it may have a ”more infinite” number of elements.
The formal definition of cardinality rests upon the notion of a one-to-one mapping between
sets.
Definition.
Sets A and B have the same cardinality if there is a one-to-one and onto function f from A
to B (a bijection.) Symbolically, we write |A| = |B|. This is also called equipotence.
Results.
1. A is equipotent to A.
Proof.
Example.
The set of even integers E has the same cardinality as the set of integers Z. We define
f : E → Z such that f (x) = x2 . Then f is a bijection, therefore |E| = |Z|.
218
34.19 cardinality of the rationals
A class of ordinals is just a subset of the ordinals. For every class of ordinals M there is an
enumerating function fM defined by transfinite recursion:
fM (α) = min{x ∈ M | f (β) < x for all β < α}
This function simply lists the elements of M in order. Note that it is not necessarily defined
for all ordinals, although it is defined for a segment of the ordinals. Let otype(M) = dom(f )
be the order type of M, which is either On or some ordinal α. If α < β then fM (α) <
fM (β), so fM is an order isomorphism between otype(M) and M.
We say M is κ-closed if for any N ⊆ M such that |N| < κ, also sup N ∈ M.
We say M is κ-unbounded if for any α < κ there is some β ∈ M such that α < β.
A function is κ-normal if it is order preserving (α < β implies f (α) < f (β)) and continuous.
In particular, the enumerating function of a κ-closed class is always κ-normal.
All these definitions can be easily extended to all ordinals: a class is closed (resp. un-
bounded) if it is κ-closed (unbounded) for all κ. A function is continuous (resp. normal)
if it is κ-continuous (normal) for all κ.
34.21 club
T
If κ is a cardinal then a set C ⊆ κ is closed iff for any S ⊆ C and α < κ, sup(S α) = α
then α ∈ C. (That is, if the limit of some sequence in C is less than κ then the limit is also
in C.)
If κ is a cardinal and C ⊆ κ then C is unbounded if, for any α < κ, there is some β ∈ C
such that α < β.
219
If a set is both closed and unbounded then it is a club set.
If κ is a regular uncountable cardinal then club(κ), the filter of all sets containing a club
subset of κ, is a κ-complete filter closed under diagonal intersection called the club filter.
To see that this is a filter, note that κ ∈ club(κ) since it is obviously both closed and
unbounded. If x ∈ club(κ) then any subset of κ containing x is also in club(κ), since x, and
therefore anything containing it, contains a club set.
It is a κ complete filter because the intersection of fewer than κ club sets is a club set.
T To
see this, suppose hCi ii<α is a sequence of club sets where α < κ. Obviously C = Ci is
closed, since any sequence which appears in C appears in every Ci , and therefore its limit is
also in every Ci . To show that it is unbounded, take some β < κ. Let hβ1,ii be an increasing
sequence with β1,1 > β and β1,i ∈ Ci for every i < α. Such a sequence can be constructed,
since every Ci is unbounded. Since α < κ and κ is regular, the limit of this sequence is less
than κ. We call it β2 , and define a new sequence hβ2,i i similar to the previous sequence.
We can repeat this process, getting a sequence of sequences hβj,ii where each element of a
sequence is greater than every member of the previous sequences. Then for each i < α, hβj,ii
is an increasing sequence contained in Ci , and all these sequences have the same limit (the
limit of hβj,ii). This limit is then contained in every Ci , and therefore C, and is greater than
β.
To see that club(κ) is closed under diagonal intersection, let hCi i, i < κ be a sequence, and
let C = ∆i<κ Ci . Since the diagonal intersection
T contains the intersection, obviously C is
unbounded. Then suppose S ⊆ C and sup(S α) = α. Then S ⊆ Cβ for every β > α, and
since each Cβ is closed, α ∈ Cβ , so α ∈ C.
34.23 countable
220
34.24 countably infinite
As the name implies, any countably infinite set is both countable and infinite.
34.25 finite
A set S is finite if there exists a natural number n and a bijection from S to n. If there
exists such an n, then it is unique, and it is called the cardinality of S.
If f is κ-normal then Fix(f ) is κ-closed and κ-normal, and therefore f 0 is also κ-normal.
Suppose we have an algebraic number such that the polynomial of smallest degree it is a
root of (with the co-efficients relatively prime) is given by:
n
X
ai xi
i=0
n
X
h=n+ |ai |
i=0
221
This is a quantity which is used in the proof of the existence of transcendental numbers.
REFERENCES
1. Shaw, R. Mathematics Society Notes, 1st edition. King’s School Chester, 2003.
2. Stewart, I. Galois Theory, 3rd edition. Chapman and Hall, 2003.
3. Baker, A. Transcendental Number Theory, 1st edition. Cambridge University Press, 1975.
A limit cardinal is a cardinal κ such that λ+ < κ for every cardinal λ < κ. Here λ+ denotes
the cardinal successor of λ. If 2λ < κ for every cardinal λ < κ, then κ is called a strong limit
cardinal.
Every strong limit cardinal is a limit cardinal, because λ+ 6 2λ holds for every cardinal λ.
Under GCH, every limit cardinal is a strong limit cardinal because in this case λ+ = 2λ for
every infinite cardinal λ.
The three smallest limit cardinals are 0, ℵ0 and ℵω . Note that some authors do not count 0,
or sometimes even ℵ0 , as a limit cardinal.
222
34.30 natural number
Given the Zermelo-Fraenkel axioms of set theory, one can prove that there exists an inductive set
X such that ∅ ∈ X. The natural numbers N are then defined to be the intersection of all
subsets of X which are inductive sets and contain the empty set as an element.
• 0 := ∅
• 1 := 00 = {0} = {∅}
Note that the set 0 has zero elements, the set 1 has one element, the set 2 has two elements,
etc. Informally, the set n is the set consisting of the n elements 0, 1, . . . , n − 1, and n is both
a subset of N and an element of N.
In some contexts (most notably, in number theory), it is more convenient to exclude 0 from
the set of natural numbers, so that N = {1, 2, 3, . . . }. When it is not explicitly specified, one
must determine from context whether 0 is being considered a natural number or not.
• a + 0 := a for all a ∈ N
• a · 0 := 0 for all a ∈ N
• a · b0 := (a · b) + a for all a, b ∈ N
The natural numbers form a monoid under either addition or multiplication. There is an
ordering relation on the natural numbers, defined by: a 6 b if a ⊆ b.
223
34.31 ordinal arithmetic
Ordinal arithmetic is the extension of normal arithmetic to the transfinite ordinal numbers.
The successor operation Sx (sometimes written x + 1, although this notation risks confusion
with the general definition of addition) is part of the definition of the ordinals, and addition
is naturally defined by recursion over this:
• x+0 =0
• x + Sy = S(x + y)
• x + α = supγ<α x + γ for limit α
If x and y are finite then x+y under this definition is just the usual sum, however when x and
y become infinite, there are differences. In particular, ordinal addition is not commutative.
For example,
ω + 1 = ω + S0 = S(ω + 0) = Sω
but
1 + ω = supn<ω 1 + n = ω
• x·0 =0
• x · Sy = x · y + x
• x · α = supγ<α x · γ for limit α
Once again this definition is equivalent to normal multiplication when x and y are finite, but
is not commutative:
ω·2 =ω·1+ω =ω+ω
but
2 · ω = supn<ω 2 · n = ω
Both these functions are strongly increasing in the second argument and weakly increasing
in the first argument. That is, if α < β then
• γ+α<γ+β
• γ·α<γ·β
• α+γ 6β+γ
• α·γ 6β·γ
224
34.32 ordinal number
Definition If X is a set, then the power set of X is the set whose elements are the subsets
of X. It is usually denoted as P(X) or 2X .
1. If X is a finite set, then |2X | = 2|X| . This property motivates the notation 2X .
2. For an arbitrary set X, Cantor’s theorem states two things about the power set: First,
there is no bijection between X and P(X). Second, the cardinality of 2X is greater
than the cardinality of X.
T if−1Fodor’s lemma is false, for every α ∈ S there is some club set Cα such that
Then
Cα f (α) = ∅. Let C = ∆α<κ Cα . The club setsTare closed under diagonal intersection,
so C is also club and therefore there is some α ∈ S C. Then α ∈ Cβ for each β < α, and
so there can be no β < α such that α ∈ f −1 (β), so f (α) > α, a contradiction.
We first prove as a lemma that for any B ⊂ A, if there is an injection f : A → B, then there
is also a bijection h : A → B.
225
Define a sequence {Ck }∞ k=0 of subsets of A by C0 = A − B and for k ≥ 0, Ck+1 = f (Ck ).
If the
T kC are not pairwise disjoint, then there are minimal
T integers j and k with j < k and
Cj Ck nonempty. Then k > 1, and so Ck ⊂ B. C0 B T = ∅, so j > 0. Thus Cj = f (Cj−1)
and Ck = f (Ck−1). By assumption, f is injective, so Cj−1 Ck−1 is nonempty, contradicting
the minimality of j. Hence the Ck are pairwise disjoint.
S
Now let C = ∞ k=0 Ck , and define h : A → B by
f (z), z ∈ C
h(z) = .
z, z 6∈ C
To prove the theorem, suppose f : S → T and g : T → S are injective. Then the composition
gf : S → g(T ) is also injective. By the lemma, there is a bijection h0 : S → g(T ). The
injectivity of g implies that g −1 : g(T ) → T exists and is bijective. Define h : S → T by
h(z) = g −1 h0 (z); this map is a bijection, and so S and T have the same cardinality.
Suppose f is a κ-normal function and consider any α < κ and define a sequence by α0 = α
and αn+1 = f (αn ). Let αω = supn<ω αn . Then, since f is continuous,
f (αω ) = sup f (αn ) = sup αn+1 = αω
n<ω n<ω
So Fix(f ) is unbounded.
226
Lemma:
Consider a natural number k. Then the number of algebraic numbers of height k is finite.
Proof:
To see this, note the sum in the definition of height is positive. Therefore:
n6k
where n is the degree of the polynomial. For a polynomial of degree n, there are only n
coefficients, and the sum of their moduli is (k − n), and there is only a finite number of ways
of doing this (the number of ways is the number of algebraic numbers). For every polynomial
with degree less than n, there are less ways. So the sum of all of these is also finite, and
this is the number of algebraic numbers with height k (with some repetitions). The result
follows.
You can start writing a list of the algebraic numbers because you can put all the ones with
height 1, then with height 2, etc, and write them in numerical order within those sets because
they are finite sets. This implies that the set of algebraic numbers is countable. However,
by diagonalisation, the set of real numbers is uncountable. So there are more real numbers
than algebraic numbers; the result follows.
H is closed
Let {αi | i < κ} be some increasing sequence of elements of H and let α = sup{αi | i < κ}.
Then for any x, y < α, it must be that x < αi and y < αj for some i, j < κ. But then
x + y < αmax{i,j} < α.
227
H is unbounded
fH(α) = ω α
For any α + 1, we have fH (α + 1) is the least additively indecomposible number greater than
fH (α). Let α0 = SfH (α) and αn+1 = αn + αn = αn · 2. Then fH (α + 1) = supn<ω αn =
supn<ω Sα · 2 · · · 2 = fH (α) · ω. The limit case is trivial since H is closed unbounded, so fH
is continuous.
Suppose we have a rational number α = p/q in lowest terms with q > 0. Define the “height”
of this number as h(α) = |p| + q. For example, h(0) = h( 10 ) = 1, h(−1) = h(1) = 2,
and h(−2) = h( −1 2
) = h( 21 ) = h(2) = 3. Note that the set of numbers with a given height
is finite. The rationals can now be partitioned into classes by height, and the numbers in
each class can be ordered by way of increasing numerators. Thus it is possible to assign
a natural number to each of the rationals by starting with 0, −1, 1, −2, −1 , 1 , 2, −3, . . . and
2 2
progressing through classes of increasing heights. This assignment constitutes a bijection
between N and Q and proves that Q is countable.
A corollary is that the irrational numbers are uncountable, since the union of the irrationals
and the rationals is R, which is uncountable.
228
34.41 successor cardinal
34.42 uncountable
1. All uncountable sets are infinite. However, the converse is not true. For instance, the
natural numbers and the rational numbers - although infinite - are both countable.
2. The real numbers form an uncountable set. The famous proof of this result is based
on Cantor’s diagonal argument.
A von Neumann integer is not an integer, but instead a construction of a natural number
using some basic set notation. The von Neumann integers are defined inductively. The
von Neumann integer zero is defined to be the empty set, ∅, and there are no smaller von
Neumann integers. The von Neumann integer N is then the set of all von Neumann integers
less than N. The set of von Neumann integers is the set of all finite von Neumann ordinals.
This form of construction from very basic notions of sets is applicable to various forms of
set theory (for instance, Zermelo-Fraenkel set theory). While this construction suffices to
define the set of natural numbers, a little more work must be done to define the set of all
integers.
229
Examples
0= ∅
1= {0} = {∅}
2= {0, 1} = {∅, {∅}}
3= {0, 1, 2} = {∅, {∅} , {{∅, {∅}}}}
..
.
N = {0, 1, . . . , N − 1}
The von Neumann ordinal α is defined to be the well-ordered set containing the von Neu-
mann ordinals which precede α. The set of finite von Neumann ordinals is known as the
von Neumann integers. Every well-ordered set is isomorphic a von Neumann ordinal.
The von Neumann ordinals have the convenient property that if a < b then a ∈ b and a ⊂ b.
230
34.45 weakly compact cardinal
Weakly compact cardinals are (large) infinite cardinals which have a property related to the
syntactic compactness theorem for first order logic. Specifically, for any infinite cardinal κ,
consider the language Lκ,κ .
(a) infinite conjunctions and disjunctions of fewer than κ formulas are allowed
The weak compactness theorem for Lκ,κ states that if ∆ is a set of sentences of Lκ,κ such
that |∆| = κ and any θ ⊂ ∆ with |θ| < κ is consistent then ∆ is consistent.
A cardinal is weakly compact if the weak compactness theorem holds for Lκ,κ .
A cardinal is weakly compact if and only if it is inaccessible and has the tree property.
Let κ be a weakly compact cardinal and let (T, <T ) be a κ tree with all levels smaller than
κ. We define a theory in Lκ,κ with for each x ∈ T , a constant cx , and a single unary relation
B. Then our theory ∆ consists of the sentences:
It should be clear that B represents membership in a cofinal branch, since the first class of
sentences asserts that no incompatible elements are both in B while the second class states
that the branch intersects every level.
Clearly |∆| = κ, since there are κ elements in T , and hence fewer than κ · κ = κ sentences
in the first group, and of course there are κ levels and therefore κ sentences in the second
group.
231
Now consider any Σ ⊆ ∆ with |Σ| < κ. Fewer than κ sentences of the second group are
included, so the set of x for which the corresponding cx must all appear in T (α) for some
α < κ. But since T has branches of arbitrary height, T (α) |= Σ.
Since κ is weakly compact, it follows that ∆ also has a model, and that model obviously has
a set of cx such that B(cx ) whose corresponding elements of T intersect every level and are
compatible, therefore forming a cofinal branch of T , proving that T is not Aronszajn.
Let X be any set and P(X) its power set. Cantor’s theorem states that there is no bijection
between X and P(X). Moreover the cardinality of P(A) is stricly greater than that of A,
that is |A| < |P(A)|.
The proof of this theorem is fairly simple using the following construction which is central
to Cantor’s diagonal argument.
Consider a function F : X → P(X) from X to its powerset. Then we define the set Z ⊆ X
as follows:
Z = {x ∈ X | x 6∈ F (x)}.
Suppose that F is, in fact, a bijection. Then there must exist an x ∈ X such that F (x) = Z.
But, by construction, we have the following contradiction:
x ∈ Z ⇔ x 6∈ F (x) ⇔ x 6∈ Z.
Hence F cannot be a bijection between X and P(X).
34.49 additive
Let φ be some real function defined on an algebra of sets A. We say that φ is additive if,
whenever A and B are disjoint sets in A, we have
[
φ(A B) = φ(A) + φ(B).
232
Suppose A is a σ-algebra. Then, given any sequence hAi i of disjoint sets in A, if we have
[ X
φ Ai = φ(Ai )
1. φ(∅) = 0.
34.50 antisymmetric
Antisymmetric is not the same thing as ”not symmetric”, as it is possible to have both at
the same time. However, a relation R that is both antisymmetric and symmetric has the
condition that xRy ⇒ x = y. There are only 2n such possible relations on A.
An example of an antisymmetric relation on A = {◦, ×, ?} would be R = {(?, ?), (×, ◦), (◦, ?), (?, ×)}.
One relation that isn’t antisymmetric is R = {(×, ◦), (?, ◦), (◦, ?)} because we have both ?R◦
and ◦R?, but ◦ =6 ?
233
Properties
1. The composition of a constant function with any function (for which composition is
defined) is a constant function.
Let f : A −→ B be a function, and let U ⊂ A be a subset. The direct image of U is the set
f (U) ⊂ B consisting of all elements of B which equal f (u) for some u ∈ U.
34.53 domain
Let R be a binary relation. Then the set of all x such that xRy is called the domain of R.
That is, the domain of R is the set of all first coordinates of the ordered pairs in R.
Let Ω be a set, and P(Ω) be the power set of Ω. A dynkin system on Ω is a set D ⊆ P(Ω)
such that
1. Ω ∈ D
2. A, B ∈ D and A ⊆ B ⇒ B \ A ∈ D
S
3. An ∈ D, An ⊆ An+1 , n ≥ 1 ⇒ ∞k=1 Ak ∈ D.
234
We define the intersection of all the dynkin systems containing A as
\
D(A) := X (34.54.2)
X∈Γ
One can easily verify that D(A) is itself a dynkin system and that it contains A. We call
D(A) the dynkin system generated by A. It is the “smallest” dynkin system containing
A.
[x] := {y ∈ S | x ∼ y}.
The set of all equivalence classes of S under ∼ is defined to be the set of all subsets of S
which are equivalence classes of S under ∼.
For any equivalence relation ∼, the set of all equivalence classes of S under ∼ is a partition
of S, and this correspondence is a bijection between the set of equivalence relations on S
and the set of partitions of S (consisting of nonempty sets).
34.56 fibre
235
Example
34.57 filtration
A1 ⊂ A2 ⊂ · · · ⊂ An .
If one considers the sets A1 , . . . , An as elements of a larger set which are partially ordered
by inclusion, then a filtration is simply a finite chain with respect to this partial ordering.
It should be noted that in some contexts the word ”filtration” may also be employed to
describe an infinite chain.
236
We say that a subset B ⊂ A is fixed by T whenever all elements of B are fixed by T , i.e.
B ⊂ AT .
Ti : A → A, i∈I
In this case we say that a subset B ⊂ A is fixed, if it is fixed by all the elements of the
family, i.e. whenever \
B⊂ ATi .
i∈I
34.60 function
For a ∈ A, one usually denotes by f (a) the unique element b ∈ B such that (a, b) ∈ R. The
set A is called the domain of f , and the set B is called the codomain.
34.61 functional
• T(x+y)=T(x)+T(y)
237
• T(cx)=cT(x)
for any c ∈ K, x, y ∈ V .
Given any family of sets {Aj }j∈J indexed by an index set J, the generalized cartesian product
Y
Aj
j∈J
34.63 graph
Definition If X is a set, then the identity map in X is the mapping that maps each
element in X to itself.
238
Properties
REFERENCES
1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974.
Definition Let X be a subset of Y . Then the inclusion map from X to Y is the mapping
ι:X → Y
x 7→ x.
In other words, the inclusion map is simply a fancy way to say that every element in X is
also an element in Y .
An inductive set is a set X with the property that, for every x ∈ X, the successor x0 of x is
also an element of X.
239
One major example of an inductive set is the set of natural numbers N.
34.67 invariant
T (x) = x.
T (B) ⊂ B.
Ti : A → A, i∈I
In this case we say that a subset is invariant, if it is invariant with respect to all elements of
the family.
1. a ∈ X, f(a) ∈ Y ;
2. Y = f(X);
3. f : X → Y is one-one;
240
Simplest case
Note
The inverse function theorem is a special case of the implicit function theorem where the
dimension of each variable is the same.
The inverse image commutes with all set operations: For any collection {Ui }i∈I of subsets of
B, we have the following identities for
1. unions: !
[ [
f −1 Ui = f −1 (Ui )
i∈I i∈I
2. intersections: !
\ \
f −1 Ui = f −1 (Ui )
i∈I i∈I
3. complements:
{
f −1 (U) = f −1 (U {)
4. set differences:
f −1 (U \ V ) = f −1 (U) \ f −1 (V )
5. symmetric differences:
i i
f −1 (U V ) = f −1 (U) f −1 (V )
241
In addition, for X ⊂ A and Y ⊂ B, the inverse image satisfies the miscellaneous identities
T
6. (f |X )−1 (Y ) = X f −1 (Y )
T
7. f (f −1 (Y )) = Y f (A)
8. X ⊂ f −1 (f (X)), with equality if f is injective.
34.70 mapping
Synonym of function, although typical usage suggests that mapping is the more generic term.
In a geometric context, the term function is often employed to connote a mapping whose
purpose is to assign values to the elements of its domain, i.e. a function defines a field of
values, whereas mapping seems to have a more geometric connotation, as in a mapping of
one space to another.
Proof. If n = 1, the claim is trivial; f is the identity mapping. Suppose n = 2, 3, . . .. Then
for any x ∈ X, we have x = f f n−1 (x) , so f is an surjection. To see that f is a injection,
suppose f (x) = f (y) for some x, y in X. Since f n is the identity, it follows that x = y. 2
242
34.73 partial mapping
In the case where n = 2, the map defined by a 7→ f (a, x) is often denoted f (·, x). Further,
any function f : X1 × X2 → Y determines a mapping from X1 into the set of mappings
of X2 into Y , namely f : x 7→ (y 7→ f (x, y)). The converse holds too, and it is customary
to identify f with f. Many of the “canonical isomorphisms” that we come across (e.g. in
multilinear algebra) are illustrations of this kind of identification.
Examples
4. Let us consider the function space spanned by the trigonometric functions sin and cos.
On this space, the derivative is a mapping of period 4.
Properties
243
34.75 pi-system
Let Ω be a set, and P(Ω) be the power set of Ω. A π-system (or pi-system) on Ω is a set
F ⊆ P(Ω) such that
\
A, B ∈ F ⇒ A B ∈ F. (34.75.1)
Since det Df (a) 6= 0 the Jacobian matrix Df (a) is invertible: let A = (Df (a))−1 be its
inverse. Choose r > 0 and ρ > 0 such that
B = Bρ (a) ⊂ E,
1
kDf (x) − Df (a)k ≤ ∀x ∈ B,
2nkAk
ρ
r≤ .
2kAk
Ty : B → Rn
Ty (x) = x + A · (y − f (x)).
If x ∈ B we have
1
kDTy (x)k = k1 − A · Df (x)k ≤ kAk · kDf (a) − Df (x)k ≤ .
2n
Let us verify that Ty is a contraction mapping. Given x1 , x2 ∈ B, by the mean-value theorem
on Rn we have
1
|Ty (x1 ) − Ty (x2 )| ≤ sup nkDTy (x)k · |x1 − x2 | ≤ |x1 − x2 |.
x∈[x1 ,x2 ] 2
1 ρ
|Ty (x) − a| ≤ |Ty (x) − Ty (a)| + |Ty (a) − a| ≤ |x − a| + |A · (y − f (a))| ≤ + kAkr ≤ ρ.
2 2
244
So Ty : B → B is a contraction mapping and hence by the contraction principle there exists
one and only one solution to the equation
Ty (x) = x,
Hence given any y ∈ Br (f (a)) we can find x ∈ B which solves f (x) = y. Let us call
g : Br (f (a)) → B the mapping which gives this solution, i.e.
f (g(y)) = y.
Let V = Br (f (a)) and U = g(V ). Clearly f : U → V is one to one and the inverse of f
is g. We have to prove that U is a neighbourhood of a. However since f is continuous in
a we know that there exists a ball Bδ (a) such that f (Bδ (a)) ⊂ Br (y0 ) and hence we have
Bδ (a) ⊂ U.
We now want to study the differentiability of g. Let y ∈ V be any point, take w ∈ Rn and
> 0 so small that y + w ∈ V . Let x = g(y) and define v() = g(y + w) − g(y).
So
g(y + ) − g(y) v() w − h(v())
lim = lim = lim Df (x)−1 · = Df (x)−1 · w
→0 →0 →0
that is
Dg(y) = Df (x)−1 .
245
34.77 proper subset
34.78 range
Let R be a binary relation. Then the set of all y such that xRy for some x is called the
range of R. That is, the range of R is the set of all second coordinates in the ordered pairs
of R.
In terms of functions, this means that the range of a function is the full set of values it can
take on (the outputs), given the full set of parameters (the inputs). Note that the range is
a subset of the codomain.
34.79 reflexive
For example, let A = {1, 2, 3}, R is a relation on A. Then R = {(1, 1), (2, 2), (3, 3), (1, 3), (3, 2)}
would be a reflexive relation, because it contains all the (a, a), a ∈ A pairs. However,
R = {(1, 1), (2, 2), (2, 3), (3, 1)} is not reflexive because it would also have to contain (3, 3).
34.80 relation
A relation is any subset of a cartesian product of two sets A and B. That is, any R ⊂ A × B
is a binary relation. One may write aRb to denote a ∈ A, b ∈ B and (a, b) ∈ R. A subset of
A × A is simply called a relation on A.
An example of a relation is the less-than relation on integers, i.e. < ⊂ Z × Z. (1, 2) ∈ <,
but (2, 1) 6∈ <.
246
34.81 restriction of a mapping
f |A : A → Y
a 7→ f (a).
Let A and B sets in some ambient set X. The set difference, or simply difference, between
A and B (in that order) is the set of all elements that are contained in A, but not in B.
This set is denoted by A \ B, and we have
A \ B = {x ∈ X | x ∈ A, x ∈
/ B}
\
= A B {,
Remark
Sometimes the set difference is also written as A − B. However, if A and B are sets in a
vector space, then A − B is commonly used to denote the set
A − B = {a − b | a ∈ A, b ∈ B},
which, in general, is not the same as the set difference of A and B. Therefore, to avoid
confusion, one should try to avoid the notation A − B for the set difference.
34.83 symmetric
247
An example of a symmetric relation on A = {a, b, c} would be R = {(a, a), (c, b), (b, c), (a, c), (c, a)}.
One relation that is not symmetric is R = {(b, b), (a, b), (b, a), (c, b)}, because since we have
(c, b) we must also have (b, c) in order to be symmetric.
The symmetric difference between two sets A and B, written A4 B, isSthe set of all
x such
S that either
T x ∈ A or x ∈ B but not both. It is equal to (A − B) (B − A) and
(A B) − (A B).
S
The symmetric
S difference operator is commutative since A4 B = (A − B) (B − A) =
(B − A) (A − B) = B4 A.
The operation is also associative. To see this, consider three sets A, B, and C. Any given
elemnet x is in zero, one, two, or all three of these sets. If x is not in any of A, B, or C,
then it is not in the symmetric difference of the three sets no matter how it is computed.
If x is in one of the sets, let that set be A; then x ∈ A4 B and x ∈ (A4 B)4 C; also,
x ∈/ (B4 C) and therefore x ∈ A4 (B4 C). If x is in two of the sets, let them be A
and B; then x ∈ / A4 B and x ∈ / (A4 B)4 C; also, x ∈ B4 C, but because x is in A,
x ∈/ A4 (B4 C). If x is in all three, then x ∈ / A4 B but x ∈ (A4 B)4 C; similarly,
x∈ / B4 C but x ∈ A4 (B4 C). Thus, A4 (B4 C) = (A4 B)4 C.
[ [ −1
(1) f −1 Bi = f Bi
i∈I i∈I
\ \ −1
(2) f −1 Bi = f Bi
i∈I i∈I
248
(3) For the set complement,
{
f −1 (A) = f −1 (A{).
For the set complement, suppose x ∈ / f −1 (A). This is equivalent to f (x) ∈ ∈ A{,
/ A, or f (x) T
−1 {
which is equivalent to x ∈ f (A ). Since the set differenceaA \ B can be S written as A B c ,
part (4) follows from parts (2) and (3). Similarly, since A B = (A \ B) (B \ A), part (5)
follows from parts (1) and (4). 2
34.86 transformation
Synonym of mapping and function. Often used to refer to mappings where the domain and
codomain are the same set, i.e. one can compose a transformation with itself. For example,
when one speaks of transformation of a space, one refers to some deformation/ of that space.
249
34.87 transitive
34.88 transitive
For example, the “is a subset of” relation ⊆ between sets is transitive. The “is not equal
to” relation 6= between integers is not transitive. If we assign to our definiton x = 5, y = 42,
and z = 5, we know that both 5 6= 42 (x 6= y) and 42 6= 5 (y 6= z). However, 5 = 5 (x = z),
so 6= is not transitive
The transitive closure of a set X is the smallest transitive set tc(X) such that X ⊆ tc(X).
[
tc(X) = f (n)
n<ω
Theorem Let X be a partially ordered set. Then there exists a maximal totally ordered
subset of X.
The Hausdorff’s maximum principle is one of the many theorems equivalent to the axiom of choice.
The below proof uses Zorn’s lemma, which is also equivalent to the axiom of choice.
250
Proof. Let S be the set of all totally ordered subsets of X. S is not empty, since the
empty set is an element of S. Given a subset τ of S, the union of all the elements of τ
is again an element of S, as is easily verified. This shows that S, ordered by inclusion, is
inductive. The result now follows from Zorn’s lemma.2
Here, by a maximal element we mean a maximal element with respect to the inclusion
ordering: A 6 B iff A ⊆ B. This lemma is equivalent to the axiom of choice.
This is one of the many propositions which are equivalent to the axiom of choice.
If X is any set whatsoever, then there exists a well-ordering of X. The well-ordering theorem
is equivalent to the axiom of choice.
251
34.95 Zorn’s lemma
Let X be a partially ordered set, and suppose that every chain in X has an upper bound.
Then X has a maximal element x, in the sense that for all y ∈ X, y 6> x.
Let C be a collection of nonempty sets. Then there exists a function f with domain
C such that f (x) ∈ x for all x ∈ C. f is sometimes called a choice function on C.
The axiom of choice is commonly (although not universally) accepted with the axioms of
Zermelo-Fraenkel set theory. The axiom of choice is equivalent to the well-ordering principle
and to Zorn’s lemma.
Hausdorff’s maximum principle implies Zorn’s lemma. Consider a partially ordered set
X, where every chain has an upper bound. According to the maximum principle there exists
a maximal totally ordered subset S Y ⊆ X. This then has an upper bound, x. If x is not
the largest element in Y then {x} Y would be a totally ordered set in which Y would be
properly contained, contradicting the definition. Thus x is a maximal element in X.
Zorn’s lemma implies the well-ordering theorem. Let X be any non-empty set, and
let A be the collection of pairs (A, 6), where A ⊆ X and 6 is a well-ordering on A. Define
a relation , on A so that for all x, y ∈ A : x y iff x equals an initial of y. It is easy
to see that this defines a partial order relation on A (it inherits reflexibility, anti symmetry
and transitivity from one set being an initial and thus a subset of the other).
For each chain C ⊆ A, define C 0 = (R, 60 ) where R is the union of all the sets A for all
(A, 6) ∈ C, and 60 is the union of all the relations 6 for all (A, 6) ∈ C. It follows that C 0
252
is an upper bound for C in A.
According to Zorn’s lemma, A now has a maximal element, (M, 6M ). We postulate that M
S for if this were not true we could for any a ∈ X − M construct
contains all members of X,
(M∗ , 6∗ ) where M∗ = M {a} and 6∗ is extended so Sa (M∗ ) = M. Clearly 6∗ then defines
a well-order on M∗ , and (M∗ , 6∗ ) would be larger than (M, 6M ) contrary to the definition.
Let X be a set partially ordered by < such that each chain has an upper bound. Equate
each x ∈ X with p(x) = {y ∈ X | x < y} ⊆ P (X). Let p(X) = {p(x) | x ∈ X}. If p(x) = ∅
then it follows that x is maximal.
Suppose no p(x) = ∅. Then by the axiom of choice there is a choice function f on p(X), and
since for each p(x) we have f (p(x)) ∈ p(x), it follows that p(x) < f (p(x)). Define fα (p(x))
for all ordinals i by transfinite induction:
f0 (p(x)) = p(x)
And for a limit ordinal α, let fα (p(x)) be the upper bound of fi (p(x)) for i < α.
This construction can go on forever, for any ordinal. Then we can easily construct a surjective
function from X to Ord by g(α) = fα (x). But that requires that X be a proper class, in
contradiction to the fact that it is a set. So there can be no such choice function, and there
must be a maximal element of X.
For the reverse, assume Zorn’s lemma and let C be any set of non-empty sets. Consider the
253
set of functions F = {f | ∀a ∈ dom(f )(a ∈ C ∧ f (a) ∈ a)} partially ordered by inclusion.
Then the union of any chain in F is also a member of F (since the union of a chain of
functions is always a function). By Zorn’s lemma, F has a maximal element f , and since
any function with domain smaller than C can be easily expanded, dom(f ) = C, and so f is
a choice function C.
Let S be a collection of sets. If, for each chain C ⊆ S, there exists an X ∈ S such that every
element of C is a subset of X, then S contains a maximal element. This is known as a the
maximality principle.
1. 1 belongs to S, and
The Second Principle of Finite Induction would replace (2) above with
2’. If k is a positive integer such that 1, 2, . . . , k belong to S, then k + 1 must also be in S.
254
34.101 principle of finite induction proven from well-
ordering principle
Let T be the set of all postive integers not in S. Assume T is nonempty. The well-ordering principle
says T contains a least element; call it a. Since 1 ∈ S, we have a > 1, hence 0 < a − 1 < a.
The choice of a as the smallest element of T means a − 1 is not in T , and hence is in S. But
then (a − 1) + 1 is in S, which forces a ∈ S, contradicting a ∈ T . Hence T is empty, and S
is all positive integers.
Let S be a set and F a set of subsets of S such that F is of finite character. By Zorn’s lemma,
it is enough to show that F is inductive. For that, it will be enough to show that if (Fi )i∈I
is a family of elements of F which is totally ordered by inclusion, then the union U of the
Fi is an element of F as well (since U is an upper bound on the family (Fi )). So, let K be
a finite subset of U. Each element of U is in Fi for some i ∈ I. Since K is finite and the Fi
are totally ordered by inclusion, there is some j ∈ I such that all elements of K are in Fj .
That is, K ⊂ Fj . Since F is of finite character, we get K ∈ F , QED.
Let X be any set and let f be a choice function on P(X) \ {∅}. Then define a function i by
transfinite recursion on the class of ordinals as follows:
[ [
i(β) = f (X − {i(γ)}) unless X − {i(γ)} = ∅ or i(γ) is undefined for some γ < β
γ<β γ<β
Thus i(0) is just f (X) (the least element of X), and i(1) = f (X − {i(0)}) (the least element
of X other than i(0)).
Define by the axiom of replacement β = i−1 [X] = {γ | i(x) = γ for some x ∈ X}. Since β is
a set of ordinals, it cannot contain all the ordinals (by the Burali-Forti paradox).
Since the ordinals are well ordered, there is a least ordinal α not in β, and therefore i(α) is
undefined. It cannot be that the second unless clause holds (since α is the least such ordinal)
255
S
so it must be that X − γ<α {i(γ)} = ∅, and therefore for every x ∈ X there is some γ < α
such that i(γ) = x. Since we already know that i is injective, it is a bijection between α and
X, and therefore establishes a well-ordering of X by x <X y ↔ i−1 (x) < i−1 (y).
S
The reverse is simple. If C is a set of nonempty sets, select any well ordering of C. Then
a choice function is just f (a) = the least member of a under that well ordering.
The Axiom of Extensionality is one of the axioms of Zermelo-Fraenkel set theory. In symbols,
it reads:
∀u(u ∈ X ↔ u ∈ Y ) → X = Y.
Note that the converse,
X = Y → ∀u(u ∈ X ↔ u ∈ Y )
is an axiom of the predicate calculus. Hence we have,
X = Y ↔ ∀u(u ∈ X ↔ u ∈ Y ).
Therefore the Axiom of Extensionality expresses the most fundamental notion of a set: a set
is determined by its elements.
The Axiom of Infinity is an axiom of Zermelo-Fraenkel set theory. At first glance, this axiom
seems to be ill-defined. How are we to know what constitutes an infinite set when we have
not yet defined the notion of a finite set? However, once we have a theory of ordinal numbers
in hand, the axiom makes sense.
Meanwhile, we can give a definition of finiteness that does not rely upon the concept of
number. We do this by introducing the notion
S of an inductive set. A set S is said to be
inductive if ∅ ∈ S and for every x ∈ S, x {x} ∈ S. We may then state the Axiom of
Infinity as follows:
256
In symbols:
[
∃S[∅ ∈ S ∧ (∀x ∈ S)[x {x} ∈ S]]
We shall then be able to prove that the following conditions are equivalent:
For any a and b there exists a set {a, b} that contains exactly a and b.
The Axiom of Pairing is one of the axioms of Zermelo-Fraenkel set theory. In symbols, it
reads:
∀a∀b∃c∀x(x ∈ c ↔ x = a ∨ x = b).
Using the axiom of extensionality, we see that the set c is unique, so it makes sense to define
the pair
{a, b} = the unique c such that ∀x(x ∈ c ↔ x = a ∨ x = b).
Using the Axiom of Pairing, we may define, for any set a, the singleton
{a} = {a, a}.
We may also define, for any set a and b, the ordered pair
(a, b) = {{a}, {a, b}}.
257
34.107 axiom of power set
The Axiom of Power Set is an axiom of Zermelo-Fraenkel set theory. In symbols, it reads:
The Power Set Axiom allows us to define the cartesian product of two sets X and Y :
X × Y = {(x, y) : x ∈ X ∧ y ∈ Y }.
We may define the Cartesian product of any finite collection of sets recursively:
X1 × · · · × Xn = (X1 × · · · × Xn−1 ) × Xn .
Notice that this means that Y is the set of elements of all elements of X. More succinctly,
the union of any set of sets is a set. By extensionality, the set Y is unique. Y is called the
union of X.
In particular, the Axiom of Union, along with the axiom of pairing allows us to define
[ [
X Y = {X, Y },
258
and therefore the n-tuple
[ [
{a1 , . . . , an } = {a1 } ··· {an }
Let φ(u, p) be a formula. For any X and p, there exists a set Y = {u ∈ X : φ(u, p)}.
The Axiom Schema of Separation is an axiom schema of Zermelo-Fraenkel set theory. Note
that it represents infinitely many individual axioms, one for each formula φ. In symbols, it
reads:
∀X∀p∃Y ∀u(u ∈ Y ↔ u ∈ X ∧ φ(u, p)).
By extensionality, the set Y is unique.
The Axiom Schema of Separation implies that φ may depend on more than one parameter
p.
Another consequence of the Axiom Schema of Separation is that a subclass of any set is a
set. To see this, let C be the class C = {u : φ(u, p1, . . . , pn )}. Then
\
∀X∃Y (C X = Y )
holds, which means that the intersection
T of C with any set is a set. Therefore, in particular,
the intersection of two sets X Y = {x ∈ X : x ∈ Y } is a set. Furthermore the difference
of two sets X − Y = {x ∈ X : x ∈ / Y } is a set and, provided there exists at least one set,
which is guaranteed by the axiom of infinity, the empty set is a set. For if X is a set, then
∅ = {x ∈ X : x 6= x} is a set.
T T
Moreover, if C is a nonempty class, then C is a set, by Separation. C is a subset of
every X ∈ C.
Lastly, we may use Separation to show that the class of all sets, V , is not a set, i.e., V is a
proper class. For example, suppose V is a set. Then by Separation
V 0 = {x ∈ V : x ∈
/ x}
is a set and we have reached a Russell paradox.
259
34.110 de Morgan’s laws
In set theory, de Morgan’s laws relate the three basic set operations to each other; the
union, the intersection, and the complement. de Morgan’s laws are named after the Indian-
born British mathematician and logician Augustus De Morgan (1806-1871) [1].
Above, de Morgan’s laws are written for two sets. In this form, they are intuitively
S quite
clear. For instance, the first claim states that an element that is not in A B S
is not in A
and not in B. It also states that an elements not in A and not in B is not in A B.
(proof)
REFERENCES
1. Wikipedia’s entry on de Morgan, 4/2003.
2. M.M. Mano, Computer Engineering: Hardware Design, Prentice Hall, 1988.
260
34.111 de Morgan’s laws for sets (proof )
Set theory is special among mathematical theories, in two ways: It plays a central role in
putting mathematics on a reliable axiomatic foundation, and it provides the basic language
and apparatus in which most of mathematics is expressed.
I will informally list the undefined notions, the axioms, and two of the “schemes” of set
theory, along the lines of Bourbaki’s account. The axioms are closer to the von Neumann-
Bernays-Gödel model than to the equivalent ZFC model. (But some of the axioms are
identical to some in ZFC; see the entry ZermeloFraenkelAxioms.) The intention here is just
to give an idea of the level and scope of these fundamental things.
261
2. the relation of membership of one set in another (x ∈ y)
3. the notion of an ordered 3. pair, which is a set comprised from two 3. other sets, in a
specific order.
Most of the eight schemes belong more properly to logic than to set theory, but they, or
something on the same level, are needed in the work of formalizing any theory that uses the
notion of equality, or uses quantifiers such as ∃. Because of their formal nature, let me just
(informally) state two of the schemes:
S6. If A and B are sets, and A = B, then anything true of A is true of B, and conversely.
S7. If two properties F (x) and G(x) of a set x are equivalent, then the “generic” set having
the property F , is the same as the generic set having the property G.
(The notion of a generic set having a given property, is formalized with the help of the
Hilbert τ symbol; this is one way, but not the only way, to incorporate what is called the
axiom of choice.)
Finally come the five axioms in this axiomatization of set theory. (Some are identical to
axioms in ZFC, q.v.)
A1. Two sets A and B are equal iff they have the same elements, i.e. iff the relation x ∈ A
implies x ∈ B and vice versa.
A2. For any two sets A and B, there is a set C such that the x ∈ C is equivalent to x = A
or x = B.
A3. Two ordered pairs (A, B) and (C, D) are equal iff A = C and B = D.
A4. For any set A, there exists a set B such that x ∈ B is equivalent to x ⊂ A; in other
words, there is a set of all subsets of A, for any given set A.
The word “infinite” is defined in terms of Axioms A1-A4. But to formulate the definition,
one must first build up some definitions and results about functions and ordered sets, which
we haven’t done here.
Moving away from foundations and toward applications, all the more complex structures
and relations of set theory are built up out of the three undefined notions. (See the entry
“Set”.) For instance, the relation A ⊂ B between two sets, means simply “if x ∈ A then
x ∈ B”.
262
Using the notion of ordered pair, we soon get the very important structure called the product
A × B of two sets A and B. Next, we can get such things as equivalence relations and order
relations on a set A, for they are subsets of A × A. And we get the critical notion of a
function
Q A → B, as a subset of A × B. Using functions, we get such things as the product
i∈I Ai of a family of sets. (“Family” is a variation of the notion of function.)
To be strictly formal, we should distinguish between a function and the graph of that func-
tion, and between a relation and its graph, but the distinction is rarely necessary in practice.
The natural numbers provide the first example. Peano, Zermelo and Fraenkel, and others
have given axiom-lists for the set N, with its addition, multiplication, and order relation;
but nowadays the custom is to define even the natural numbers in terms of sets. In more
detail, a natural number is the order-type of a finite well-ordered set. The relation m ≤ n
between m, n ∈ N is defined with the aid of a certain theorem which says, roughly, that for
any two well-ordered sets, one is a segment of the other. The sum or product of two natural
numbers is defined as the cardinal of the sum or product, respectively, of two sets. (For an
extension of this idea, see surreal numbers.)
(The term “cardinal” takes some work to define. The “type” of an ordered set, or any other
kind of structure, is the “generic” structure of that kind, which is defined using τ .)
Groups provide another simple example of a structure defined in terms of sets and ordered
pairs. A group is a pair (G, f ) in which G is just a set, and f is a mapping G × G → G
satisfying certain axioms; the axioms (associativity etc.) can all be spelled out in terms of
sets and ordered pairs, although in practice one uses algebraic notation to do it. When we
speak of (e.g.) “the” group S3 of permutations of a 3-element set, we mean the “type” of
such a group.
Topological spaces provide another example of how mathematical structures can be defined
in terms of, ultimately, the sets and ordered pairs in set theory. A topological space is a pair
(S, U), where the set S is arbitrary, but U has these properties:
Many special kinds of topological spaces are defined by enlarging this list of restrictions on
U.
Finally, many kinds of structure are based on more than one set. E.g. a left module is a
commutative group M together with a ring R, plus a mapping R × M → M which satisfies
263
a specific set of restrictions.
Although set theory provides some of the language and apparatus used in mathematics
generally, that language and apparatus have expanded over time, and now include what are
called “categories” and “functors”. A category is not a set, and a functor is not a mapping,
despite similarities in both cases. A category comprises all the structured sets of the same
kind, e.g. the groups, and contains also a definition of the notion of a morphism from one
such structured set to another of the same kind. A functor is similar to a morphism but
compares one category to another, not one structured set to another. The classic examples
are certain functors from the category of topological spaces to the category of groups.
34.113 union
The union of two sets A and B isSthe set which contains all x ∈ A and all x ∈ B. The
union of A and B is written as (A B).
[
x∈A B ⇔ (x ∈ A) ∨ (x ∈ B)
34.114 universe
1. If x ∈ U and y ∈ x, then y ∈ U.
2. If x, y ∈ U, then {x, y} ∈ U.
264
3. If x ∈ U, then the power set P(x) ∈ U.
S
4. If {xi |i ∈ I ∈ U} is a family of elements of U, then i∈I xi ∈ U.
1. If x ∈ U, then {x} ∈ U.
2. If x is a subset of y ∈ U, then x ∈ U.
REFERENCES
[SGA4] Grothendieck et al. SGA4.
In NBG, the proper classes are differentiated from sets by the fact that they do not belong
to other classes. Thus in NBG we have
Set(x) ↔ ∃yx ∈ y
Another interesting fact about proper classes within NBG is the following limitation of size principle
of von Neumann:
265
¬Set(x) ↔ |x| = |V |
where V is the set theoretic universe. This principle can in fact replace in NBG essen-
tially all set existence axioms with the exception of the powerset axiom (and obviously the
axiom of infinity). Thus the classes that are proper in NBG are in a very clear sense big,
while the sets are small.
In the latter alternative we take ZFC and relativise all of its axioms to sets, i.e. we replace
every expression of form ∀xφ with ∀x(Set(x) → φ) and ∃xφ with ∃x(Set(x) ∧ φ) and add
the class comprehension scheme
Notice the important restriction to formulae with quantifiers restricted to sets in the scheme.
This requirement makes the NBG proper classes predicative; you can’t prove the existence
of a class the definition of which quantifies over all classes. This restriction is essential; if we
loosen it we get a theory that is not conservative over ZFC. If we allow arbitrary formulae in
the class comprehension axiom scheme we get what is called Morse-Kelley set theory. This
theory is essentially stronger than ZFC or NBG. In addition to these axioms, NBG also
contains the global axiom of choice
\
∃C∀x∃z(C x = {z})
Another way to axiomatise NBG is to use the eight Gödel class construction functions. These
functions correspond to the various ways in which one can build up formulae (restricted
to sets!) with set parameters. However, the functions are finite in number and so are the
resulting axioms governing their behaviour. In particular, since there is a class corresponding
to any restricted formula, the intersection of any set and this class exists too (and is set).
Thus the comprehension scheme of ZFC can be replaced with a finite number of axioms,
provided we allow for proper classes.
It is easy to show that everything provable in ZF is also provable in NBG. It is also not too
difficult to show that NBG - global choice is conservative extension of ZFC. However, showing
that NBG (including global choice) is a conservative extension of ZFC is considerably more
difficult. This is equivalent to showing that NBG with global choice is conservative over
266
NBG with only local choice (choice restricted to sets). In order to do this one needs to use
(class) forcing. This result is usually credited to Easton and Solovay.
Let κ be a regular cardinal and let hQ̂β iβ<α be a finite support iterated forcing where for
every β < α,
Pβ Q̂β has the κ chain condition.
By induction:
If Pα satisfies the κ chain condition then so does Pα+1 , since Pα+1 is equivalent to Pα ∗ Qα
and composition preserves the κ chain condition for regular κ.
Suppose α is a limit ordinal and Pβ satisfies the κ chain condition for all β < α. Let
S = hpi ii<κ be a subset of Pα of size κ. The domains of the elements of pi form κ finite
subsets of α, so if cf(α) > κ then these are bounded, and by the inductive hypothesis, two
of them are compatible.
Otherwise, if cf(α) < κ, let hαj ij<cf(α) be an increasing sequence of ordinals cofinal in α.
Then for any i < κ there is some n(i) < cf(α) such that dom(pi ) ⊆ αn(i) . Since κ is regular
and this is a partition of κ into fewer than κ pieces, one piece must have size κ, that is, there
is some j such that j = n(i) for κ values of i, and so {pi | n(i) = j} is a set of conditions
of size κ contained in Pαj , and therefore contains compatible members by the induction
hypothesis.
Finally, if cf(α) = κ, let C = hαj ij<κ be a strictly increasing, continuous sequence cofinal in
α. Then for every i < κ there is some n(i) < κ such that dom(pi ) ⊆ αn(i) . When n(i) is a
limit ordinal, Tsince C is continuous, there is also (since dom(pi ) is finite) some f (i) < i such
that dom(pi ) [αf (i) , αi ) = ∅. Consider the set E of elements i such that i is a limit ordinal
and for any j < i, n(j) < i. This is a club, so by Fodor’s lemma there is some j such that
{i | f (i) = j} is stationary.
For each pi such that f (i) = j, consider p0i = pi j. There are κ of these, all members of Pj ,
so two of them must be compatible, and hence those two are also compatible in P .
267
34.117 chain condition
A partial order P satisfies the κ-chain condition if for any S ⊆ P with |S| = κ then there
exist distinct x, y ∈ S such that either x 6 y or y 6 x.
Then take a set of P -names Q such that given a P name Q̃ of Q,
P Q̃ = Q̂ (that is, no
matter which generic subset G of P we force with, the names in Q correspond precisely to
the elements of Q̂[G]). We can define
P ∗ Q = {hp, q̂i | p ∈ P, q̂ ∈ Q}
We can define a partial order on P ∗ Q such that hp1 , q̂1 i 6 hp2 , q̂2 i iff p1 6P p2 and p1
q̂1 6Q̂ q̂2 . (A note on interpretation: q1 and q2 are P names; this requires only that q̂1 6 q̂2
in generic subsets contain p1 , so in other generic subsets that fact could fail.)
Then P ∗ Q̂ is itself a forcing notion, and it can be shown that forcing by P ∗ Q̂ is equivalent
to forcing first by P and then by Q̂[G].
Let κ be a regular cardinal. Let P be a forcing notion satisfying the κ chain condition.
Let Q̂ be a P -name such that
P Q̂ is a forcing notion satisfying the κ chain
condition. Then P ∗ Q satisfies the κ chain conditon.
268
Proof:
Outline
We prove that there is some p such that any generic subset of P including p also includes κ
of the pi . Then, since Q[G] satisfies the κ chain condition, two of the corresponding q̂i must
be compatible. Then, since G is directed, there is some p stronger than any of these which
forces this to be true, and therefore makes two elements of S compatible.
If no p forces this then every p forces that it is not true, and therefore
P |{i | pi ∈ G}| 6 κ.
Since κ is regular, this means that for any generic G ⊆ P , {i | pi ∈ G} is bounded. For
each G, let f (G) be the least α such that β < α implies that there is some γ > β such that
pγ ∈ G. Define B = {α | α = f (G)} for some G.
Since κ is regular, α = sub(B) < κ. But obviously pα+1
pα+1 ∈ Ĝ. This is a contradiction,
so we conclude that there must be some p such that p
|{i | pi ∈ Ĝ}| = κ.
If G ⊆ P is any generic subset containing p then A = {q̂i [G] | pi ∈ G} must have cardinality
κ. Since Q[G] satisfies the κ chain condition, there exist i, j < κ such that pi , pj ∈ G and
there is some q̂[G] ∈ Q[G] such that q̂[G] 6 q̂i [G], q̂j [G]. Then since G is directed, there is
some p0 ∈ G such that p0 6 pi , pj , p and p0
q̂[G] 6 q̂1 [G], q̂2 [G]. So hp0 , q̂i 6 hpi , q̂i i, hpj , q̂j i.
Let P and Q be two forcing notions such that given any generic subset G of P there is a
generic subset H of Q with M[G] = M[H] and vice-versa. Then P and Q are equivalent.
269
Since if G ∈ M[H], τ [G] ∈ M for any P -name τ , it follows that if G ∈ M[H] and H ∈ M[G]
then M[G] = M[H].
If M is a transitive model of set theory and P is a partial order then we can define a forcing
relation:
p
P φ(τ1 , . . . , τn )
(p forces φ(τ1 , . . . , τn ))
Specifically, the relation holds if for every generic filter G over P which contains p,
That is, p forces φ if every extension of M by a generic filter over P containing p makes φ
true.
If p
P φ holds for every p ∈ P then we can write
P φ to mean that for any generic G ⊆ P ,
M[G] φ.
Suppose P and Q are forcing notions and that f : P → Q is a function such that:
• f [P ] is dense in Q
270
Proof
We seek to provide two operations (computable in the appropriate universes) which convert
between generic subsets of P and Q, and to prove that they are inverses.
Given H constructed as above, we can recover G as the set of p ∈ P such that f (p) ∈ H.
Obviously every element from G is included in the new set, so consider some p such that
f (p) ∈ H. By definition, there is some p1 ∈ G such that f (p1 ) 6 f (p). Take some dense
D ∈ Q such that there is no d ∈ D such that f (p) 6 d (this can be done easily be taking
any dense subset and removing all such elements; the resulting set is still dense since there
is some d1 such that d1 6 f (p) 6 d). This set intersects f [G] in some q, so there is some
p2 ∈ G such that f (p2 ) 6 q, and since G is directed, some p3 ∈ G such that p3 6 p2 , p1 . So
f (p3 ) 6 f (p1 ) 6 f (p). If p3 p then we would have p 6 p3 and then f (p) 6 f (p3 ) 6 q,
contradicting the definition of D, so p3 6 p and p ∈ G since G is directed.
271
Consider D, the set of elements of Q which are f (p) for some p ∈ P and either f (p) 6 q or
there is no element greater than both f (p) and q. This is dense, since given any q1 ∈ Q, if
q1 6 q then (since f [P ] is dense) there is some p such that f (p) 6 q1 6 q. If q 6 q1 then
there is some p such that f (p) 6 q 6 q1 . If neither of these and q there is some r 6 q1 , q then
any p such that f (p) 6 r suffices, and if there is no such r then any p such that f (p) 6 q
suffices.
T
There is some f (p) ∈ D H, and so p ∈ G. Since H is directed, there is some r 6 f (p), q,
so f (p) 6 q 6 f (p1 ), f (p2 ). If it is not the case that f (p) 6 f (p1 ) then f (p) = f (p1 ) = f (p2 ).
In either case, we confirm that H is directed.
Finally, let D be a dense subset of P . f [D] is dense in Q, since given any q ∈ Q, there
is some p ∈ TP such that p 6 q,Tand some d ∈ D such that d 6 p 6 q. So there is some
f (p) ∈ f [D] H, and so p ∈ D G.
Let P0 = ∅.
For β 6 α, Pβ is the set of all functions f such that dom(f ) ⊆ β and for any i ∈ dom(f ),
f (i) is a Pi -name for a member of Q̂i . Order Pβ by the rule f 6 g iff dom(g) ⊆ dom(f ) and
for any i ∈ dom(f ), g i
f (i) 6Q̂i g(i). (Translated, this means that any generic subset
including g restricted to i forces that f (i), an element of Q̂i , be less than g(i).)
272
(FS), if Pβ is restricted to countable functions, it is called a countable support iterated
function (CS), and in general if each function in each Pβ has size less than κ then it is a
< κ-support iterated forcing.
Typically we construct the sequence of Q̂β ’s by induction, using a function F such that
F (hQ̂β iβ<γ ) = Q̂γ .
There is a function satisfying forcings are equivalent if one is dense in the other f : Pα ∗
Qα → Pα+1 .
Proof
S
Let f (hg, q̂i) = g {hα, q̂i}. This is obviously a member of Pα+1 , since it is a partial function
from α + 1 (and if the domain of g is less than α then so is the domain of f (hg, q̂i)), if i < α
then obviously f (hg, q̂i) applied to i satisfies the definition of iterated forcing (since g does),
and if i = α then the definition is satisfied since q̂ is a name in Pi for a member of Qi .
f is order preserving, since if hg1 , q̂1 i 6 hg2 , q̂2 i, all the appropriate characteristics of a
function carry over to the image, and g1 α
Pi q̂1 6 q̂2 (by the definition of 6 in ∗).
If hg1 , q̂1 i and hg2 , q̂2 i are incomparable then either g1 and g2 are incomparable, in which
case whatever prevents them from being compared applies to their images as well, or q̂1 and
q̂2 aren’t compared appropriately, in which case again this prevents the images from being
compared.
Finally, let g be any element of Pα+1 . Then g α ∈ Pα . If α ∈ / dom(g) then this is just g,
and f (hg, q̂i) 6 g for any q̂. If α ∈ dom(g) then f (hg α, g(α)i) = g. Hence f [Pα ∗ Qα ] is
dense in Pα+1 , and so these are equivalent.
34.125 name
We need a way to refer to objects of M[G] within M. This is done by assigning a name to
each element of M[G].
Given a partial order P , we construct the P -names by induction. Each name is just a
273
relation between P and the set of names already constructed; that is, a name is a set of
ordered pairs of the form (p, τ ) where p ∈ P and τ is a name constructed at an earlier level
of the induction.
Given a generic subset G ⊆ P , we can then define the interpretation τ [G] of a P -name τ in
M[G] by:
τ [G] = {τ 0 [G] | (p, τ 0 ) ∈ τ } for some p ∈ G
The generic subset can be thought of as a ”key” which reveals which potential elements of
τ are actually elements.
x̂ = {(p, ŷ) | y ∈ x, p ∈ P }
This guarantees that the elements of x̂[G] will be exactly the same as the elements of x,
regardless of which members of P are contained in G.
If P is a partial order with satisfies the κ chain condition and G is a generic subset of P then
for any κ < λ ∈ M, λ is also a cardinal in M[G], and if cf(α) = λ in M then also cf(α) = λ
in M[G].
This theorem is the simplest way to control a notion of forcing, since it means that a notion
of forcing does not have an effect above a certain point. Given that any P satisfies the |P |+
chain condition, this means that most forcings leaves all of M above a certain point alone.
(Although it is possible to get around this limit by forcing with a proper class.)
Outline:
274
Given any function f purporting to violate the theorem by being surjective (or cofinal) on
λ, we show that there are fewer than κ possible values of f (α), and therefore only max(α, κ)
possible elements in the entire range of f , so f is not surjective (or cofinal).
Details:
There is some function f ∈ M[G] and some cardinal α < λ such that f : α → λ is surjective.
This has a name, fˆ. For each β < α, consider
ˆ
Fβ = {γ < λ | p
f(β) = γ} for some p ∈ P
|Fβ | < κ, since any two p ∈ P which force different values for fˆ(β) are incompatible and P
has no sets of incompatible elements of size κ.
S
Notice that Fβ is definable in M. Then the range of f must be contained in F = i<α Fi .
But |F | 6 α · κ = max(α, κ) < λ. So f cannot possibly be surjective, and therefore λ is not
collapsed.
Now suppose that for some α > λ > κ, cf(α) = λ in M and for some η < λ there is a cofinal
function f : η → α.
S
We can construct Fβ as above, and again the range of f is contained in F = i<η Fi . But
then | range(f )| 6 |F | 6 η · κ < λ. So there is some γ < α such that f (β) < γ for any β < η,
and therefore f is not cofinal in α.
This is a long and complicated proof, the more so because the meaning of Q shifts depending
on what generic subset of P is being used. It is therefore broken into a number of steps. The
core of the proof is to prove that, given any generic subset G of P and a generic subset H of
Q̂[G] there is a corresponding generic subset G ∗ H of P ∗ Q such that M[G][H] = M[G ∗ H],
and conversely, given any generic subset G of P ∗ Q we can find some generic GP of P and
a generic GQ of Q̂[GP ] such that M[GP ][GQ ] = M[G].
We do this by constructing functions using operations which can be performed within the
forced universes so that, for example, since M[G][H] has both G and H, G ∗ H can be
calculated, proving that it contains M[G ∗ H]. To ensure equality, we will also have to
ensure that our operations are inverses; that is, given G, GP ∗ GH = G and given G and H,
(G ∗ H)P = P and (G ∗ H)Q = H.
275
The remainder of the proof merely defines the precise operations, proves that they give
generic sets, and proves that they are inverses.
G ∗ H is a generic filter
G ∗ H is closed
Let hp1 , q̂1 i ∈ G ∗ H and let hp1 , q̂1 i 6 hp2 , q̂2 i. Then we can conclude p1 ∈ G, p1 6 p2 ,
q̂1 [G] ∈ H, and p1
q̂1 6 q̂2 , so p2 ∈ G (since G is closed) and q̂2 [G] ∈ H since p1 ∈ G and
p1 forces both q̂1 6 q̂2 and that H is downward closed. So hp2 , q̂2 i ∈ G ∗ H.
G ∗ H is directed
Suppose hp1 , q̂1 i, hp1 , q̂1 i ∈ G ∗ H. So p1 , p2 ∈ G, and since G is directed, there is some p3 6
p1 , p2 . Since q̂1 [G], q̂2 [G] ∈ H and H is directed, there is some q̂3 [G] 6 q̂1 [G], q̂2 [G]. Therefore
there is some p4 6 p3 , p4 ∈ G, such that p4
q̂3 6 q̂1 , q̂2 , so hp4 , q̂3 i 6 hp1 , q̂1 i, hp1 , q̂1 i and
hp4 , q̂3 i ∈ G ∗ H.
G ∗ H is generic
Suppose D is a dense subset of P ∗ Q̂. We can project it into a dense subset of Q using G:
276
DQ = {q̂[G] | hp, q̂i ∈ D} for some p ∈ G
Given any q̂0 ∈ Q̂, take any p0 ∈ G. Then we can define yet another dense subset, this one
in G:
Take any p ∈ P such that p 6 p0 . Then, since D is dense in P ∗ Q̂, we have some hp1 , q̂1 i 6
hp, q̂0 i such that hp1 , q̂1 i ∈ D. Then by definition p1 6 p and p1 ∈ Dq̂0 .
T
From this lemma, we can conclude that there is some p1 6 p0 such that p1 ∈ G Dq̂0 , and
therefore some q̂1 such that p1
q̂1 6 q̂0 where hp1 , q̂1 i ∈ D. So DQ is indeed dense in Q̂[G].
T
Since DQ is dense in Q̂[G], there is some q̂ such that q̂[G] ∈ DQ H, and so some p ∈ G
such that hp, q̂i ∈ D. But since p ∈ G and q̂ ∈ H, hp, q̂i ∈ G ∗ H, so G ∗ H is indeed generic.
GP is a generic filter
GP is closed
Take any p1 ∈ GP and any p2 such that p1 6 p2 . Then there is some p0 6 p1 satisfying the
definition of GP , and also p0 6 p2 , so p2 ∈ GP .
GP is directed
Consider p1 , p2 ∈ GP . Then there is some p01 and some q̂1 such that hp01 , q̂1 i ∈ G and some
p02 and some q̂2 such that hp02 , q̂2 i ∈ G. Since G is directed, there is some hp3 , q̂3 i ∈ G such
277
that hp3 , q̂3 i 6 hp01 , q̂1 i, hp02 , q̂2 i, and therefore p3 ∈ GP , p3 6 p1 , p2 .
GP is generic
Let D be a dense subset of P . Then D 0 = {hp, q̂i | p ∈ D}. Clearly this is dense, since if
hp, q̂i ∈ P ∗ Q̂ then there is some
T p0 6 p such that p0 ∈TD, so hp0 , q̂i ∈ D 0 and hp0 , q̂i 6 hp, q̂i.
0
So there is some hp, q̂i ∈ D G, and therefore p ∈ D GP . So GP is generic.
GQ is a generic filter
(Notice that GQ is dependant on GP , and is a subset of Q̂[GP ], that is, the forcing notion
inside M[GP ], as opposed to the set of names Q which we’ve been primarily working with.)
GQ is closed
Suppose q̂1 [GP ] ∈ GQ and q̂1 [GP ] 6 q̂2 [GP ]. Then there is some p1 ∈ GP such that p1
q̂1 6 q̂2 . Since p1 ∈ GP , there is some p2 6 p1 such that for some q̂3 , hp2 , q̂3 i ∈ G. By the
definition of GQ , there is some p3 such that hp3 , q̂1 i ∈ G, and since G is directed, there is
some hp4 , q̂4 i ∈ G and hp4 , q̂4 i 6 hp3 , q̂1 i, hp2 , q̂3 i. Since G is closed and hp4 , q̂4 i 6 hp4 , q̂2 i, we
have q̂2 [GP ] ∈ GQ .
GQ is directed
Suppose q̂1 [GP ], q̂2 [GP ] ∈ GQ . Then for some p1 , p2 , hp1 , q̂1 i, hp2 , q̂2 i ∈ G, and since G is
directed, there is some hp3 , q̂3 i ∈ G such that hp3 , q̂3 i 6 hp1 , q̂1 i, hp2 , q̂2 i. Then q̂3 [GP ] ∈ GQ
and since p3 ∈ G and p3
q̂3 6 q̂1 , q̂2 , we have q̂3 [GP ] 6 q̂1 [GP ], q̂2 [GP ].
GQ is generic
Let D be a dense subset of Q[GP ] (in M[GP ]). Let D̂ be a P -name for D, and let p1 ∈ GP
be a such that p1
D̂ is dense. By the definition of GP , there is some p2 6 p1 such that
hp2 , q̂2 i ∈ G for some q2 . Then D 0 = {hp, q̂i | p
q̂ ∈ D ∧ p 6 p2 }.
278
Lemma: D 0 is dense (in G) above hp2 , q̂2 i
Take any hp, q̂i ∈ P ∗ Q such that hp, q̂i 6 hp2 , q̂2 i. Then p
D̂ is dense, and therefore
there is some q̂3 such that p
q̂3 ∈ D̂ and p
q̂3 6 q̂. So hp, q̂3 i 6 hp, q̂i and hp, q̂3 i ∈ D 0 .
T
Take any hp3 , q̂3 i ∈ D 0 G. Then p3 ∈ GP , so q̂3 ∈ D, and by the definition of GQ , q̂3 ∈ GQ .
GP ∗ GQ = G
(G ∗ H)P = G
Suppose p ∈ (G ∗ H)P . Then there is some p0 ∈ P and some q̂ ∈ Q such that p0 6 p and
hp0 , q̂i ∈ G ∗ H. By the definition of G ∗ H, p0 ∈ G, and then since G is closed p ∈ G.
Conversely, suppose p ∈ G. Then (since H is non-trivial), hp, q̂i ∈ G ∗ H for some q̂, and
therefore p ∈ (G ∗ H)P .
(G ∗ H)Q = H
Given any q ∈ H, there is some q̂ ∈ Q such that q̂[G] = q, and so there is some p such that
hp, q̂i ∈ G ∗ H, and therefore q̂[G] ∈ H.
On the other hand, if q ∈ (G ∗ H)Q then there is some hp, q̂i ∈ G ∗ H, and therefore some
q̂[G] ∈ H.
279
34.129 complete partial orders do not add small sub-
sets
Suppose P is a κ-complete partial order in M. Then for any generic subset G, M contains
no bounded subsets of κ which are not in M.
Take any x ∈ M[G], x ⊆ κ. Let x̂ be a name for x. There is some p ∈ G such that
p
x̂ is a subset of κ bounded by λ < κ
Outline:
Details:
Note that these elements all exist since for any p ∈ P and any (first-order) sentence φ there
is some q 6 p such that q forces either φ or ¬φ.
q ∗ not only forces that x̂ is a bounded subset of κ, but for every ordinal it forces whether or
not that ordinal is contained in x̂. But the set {α < λ | q ∗
α ∈ x̂} is defineable in M, and
is of course equal to x̂[G∗ ] in any generic G∗ containing q ∗ . So q ∗
x̂ ∈ M.
Since this holds for any element stronger than p, it follows that p
x̂ ∈ M, and therefore
x̂[G] ∈ M.
280
34.131 3 is equivalent to ♣ and continuum hypothesis
3S ↔ ♣S
Given any cardinals κ and λ in M, we can use the Levy collapse to give a new model
M[G] where λ = κ. Let P = Levy(κ, λ) be the set of partial functions f : κ → λ with
| dom(f )| < κ. These functions each give partial information about a function F which
collapses λ onto κ.
S
Given any generic subset G of P , M[G] has a set G, so let F = G. Each element of G is a
partial function, and they are all compatible, so F is a function. dom(G) = κ since for each
α < κ the set of f ∈ P such that α ∈ dom(f ) is dense (given any function without α, it is
trivial to add (α, 0), giving a stronger function which includes α). Also range(G) = λ since
the set of f ∈ P such that α < λ is in the range of f is again
S dense (the domain of each f is
bounded, so if β is larger than any element of dom(f ), f {(β, α)} is stronger than f and
includes λ in its domain).
We can generalize this by forcing with P = Levy(κ, < λ) with λ regular, the set of partial
functions f : λ × κ → λ such that f (0, α) = 0, | dom(f )| < κ and if α > 0 then f (α, i) < α.
In essence, this is the union of Levy(κ, η) for each κ < η < λ.
S
In M[G], define F = G and Fα (β) = F (α, β). Each Fα is a function from κ to α, and by
the same argument as above Fα is both total and surjective. Moreover, it can be shown that
P satisfies the λ chain condition, so λ does not collapse and λ = κ+ .
281
34.133 proof of 3 is equivalent to ♣ and continuum
hypothesis
The proof that 3S implies both ♣S and that for every λ < κ, 2λ 6 κ are given in the entries
for 3S and ♣S .
We can define a new sequence, D = hDα iα∈S such that x ∈ Dα ↔ x ∈ Bβ for some β ∈ Aα .
We can show that D satisfies 3S .
T is bounded
T is unbounded
• j(0) = 0
T
• To find j(α), take X {j(β) | β < α}. This is a bounded subset of κ, so is equal to
an unbounded series of elementsSof B. Take j(α) = γ, where γ is T
the least number
greater than any element of {α} {j(β) | β < α} such that Bγ = X {j(β) | β < α}.
Next, consider C, the set of ordinals less than κ closed under j. Clearly it is unbounded,
since if λ < κ then j(λ) includes j(α) for α < λ, and so induction gives an ordinal greater
than λ closed under j (essentially the result of applying
T j an infinite number of times). Also,
C is closed: take any c ⊆ C and suppose sup(c α) = α. Then for any β < α, there is some
282
γ ∈ c such that β < γ < α and therefore j(β) < γ. So α is closed under j, and therefore
contained in C.
T
Since C is a club, C 0 = C S 0 is stationary. Suppose α ∈ C 0 . Then x ∈ Dα ↔ x ∈TBβ
where β ∈ Aα . Since α ∈ S 0 , β ∈ range(j), and therefore Bβ ⊆ T . Next take any x ∈ T α.
Since α ∈ C, it is closed under j, hence there is some γ ∈ α such that j(x) ∈ γ. Since
sup(Aα ) = α, there is some η ∈ Aα such that γ < η, so j(x) ∈ η. Since η ∈ Aα , Bη ⊆ Dα ,
and since η ∈ range(j), j(δ) ∈ Bη for any δ T < j −1 (η), and in particular x ∈ Bη . Since we
showed above that Dα ⊆ α, we have Dα = T α for any α ∈ C 0 .
For any cardinal κ, Martin’s Axiom for κ (MAκ ) states that if P is a partial order
satisfying ccc then given any set of κ dense subsets of P , there is a directed subset intersecting
each such subset. Martin’s Axiom states that MAκ holds for every κ < 2ℵ0 .
Given a countable collection of dense subsets of a partial order, we can selected a set hpn in<ω
such that pn is in the n-th dense subset, and pn+1 6 pn for each n. Therefore CH implies
MA.
κ > ℵ0 , so 2κ > 2ℵ0 , hence it will suffice to find an surjective function from P(ℵ0 ) to P(κ).
T
Let A = hAα iα<κ , a sequence of infinite subsets of ω such that for any α 6= β, Aα Aβ is
finite.
Given any subset S ⊆ κ we will construct a function f : ω → {0, 1} such that a unique S can
be recovered from each f . f will have the property that if i ∈ S then f (a) = 0 for finitely
many elements a ∈ Ai , and if i ∈
/ S then f (a) = 0 for infinitely many elements of Ai .
Let P be the partial order (under inclusion) such that each element p ∈ P satisfies:
283
• p is a partial function from ω to {0, 1}
• There exist i1 , . . . , in ∈ S such that for each j < n, Aij ⊆ dom(p)
S
• There is a finite subset of ω, wp , such that wp = dom(p) − j<n Aij
• For each j < n, p(a) = 0 for finitely many elements of Aij
This satisfies ccc. To see this, consider any uncountable sequence S = hpα iα<ω1 of elements
of P . There are only countably many finite subsets of ω, so there is some w ⊆ ω such that
w = wp for uncountably many p ∈ S and p w is the same for each such element. Since each
of these function’s domain covers only a finite number of the Aα , and is 1 on all but a finite
number of elements in each, there are only a countable number of different combinations
available, and therefore two of them are compatible.
• Dn = {p ∈ P | n ∈ dom(p)} for n < ω. This is obviously dense since any p not already
in Dn can be extended to one which is by adding hn, 1i
• DS
α = {p ∈ P | dom(p) ⊇ Aα } for α ∈ S. This is dense since if p ∈
/ Dα then
p {ha, 1i | a ∈ Aα \ dom(p)} is.
• For each α ∈/ S, n < ω, Dn,α = {p ∈ P | m > n ∧ p(m) = 0} for some m < ω. This
T T SS
is dense since if p ∈
/ Dn,α then dom(p) Aα = Aα wp j Aij . But wp is finite,
and the intersection of Aα with any other Ai is finite, so this intersection is finite,
S hence bounded by some m. Aα is infinite, so there is some m 6 x ∈ Aα . So
and
p {hx, 0i} ∈ Dn,α .
By MAκ , given any set of κ dense subsets of P , there is a generic G which intersects all of
them. There are a total of ℵ0 + |S| + (κ − |S|) · ℵ0 = κ dense subsets in these
S three groups,
and hence some generic G intersecting all of them. Since
T G is directed, g = G is a partial
function from ω to {0, 1}.TSince for each n < ω, G Dn is non-empty, n ∈ dom(g), so g is
a total function. Since G Dα for α ∈ S is non-empty, there is some element of G whose
domain contains all of Aα and is 0 onT a finite number of them, hence g(a) = 0 for a finite
number of a ∈ Aα . Finally, since G Dn,α for each n < ω, α ∈ / S, the set of n ∈ Aα such
that g(n) = 0 is unbounded, and hence infinite. So g is as promised, and 2κ = 2ℵ0 .
If κ is an uncountable strong limit cardinal such that for any λ < κ, κλ = κ then it is
consistent that 2ℵ0 = κ and MA. This is shown by using finite support iterated forcing to
284
construct a model of ZFC in which this is true. Historically, this proof was the motivation
for developing iterated forcing.
Outline
The proof uses the convenient fact that MAκ holds as long as it holds for all partial orders
smaller than κ. Given the conditions on κ, there are at most κ names for these partial orders.
At each step in the forcing, we force with one of these names. The result is that the actual
generic subset we add intersects every dense subset of every partial order.
Construction of Pκ
Q̂α will be constructed by induction with three conditions: |Pα | 6 κ for all α 6 κ,
Pα Q̂α ⊆
M, and Pα satisfies the ccc. Note that a partial ordering on a cardinal λ < κ is a function
from λ × λ to {0, 1}, so there are at most 2λ < κ of them. Since a canonical name for a
partial ordering of a cardinal is just a function from Pα to that cardinal, there are at most
λ
κ2 6 κ of them.
At each of the κ steps, we want to deal with one of these possible partial orderings, so we
need to partition the κ steps in to κ steps for each of the κ cardinals less than κ. In addition,
we need to include every Pα name for any level. Therefore, we partion κ into hSγ,δ iγ,δ<κ for
each cardinal δ, with each Sγ,δ having cardinality κ and the added condition that η ∈ Sγ,δ
implies η > γ. Then each Pγ name for a partial ordering of δ is assigned some index η ∈ Sγ,δ ,
and that partial order will be dealt with at stage Qη .
Formally, given Q̂β for β < α, Pα can be constructed and the Pα names for partial orderings
of each cardinal δ enumerated by the elements of Sα,δ . α ∈ Sγ,δ for some γα and δα , and
α > γα so some canonical Pγα name for a partial order 6̂α of δα has already been assigned
to α.
Since 6̂α is a Pγα name, it is also a Pα name, so Q̂α can be defined as hδα , 6̂α i if
Pα hδα , 6̂α i
satisfies the ccc and by the trivial partial order h1, {h1, 1i}i otherwise. Obviously this
satisfies the ccc, and so Pα+1 does as well. Since Q̂α is Peither trivial or a cardinal together
with a canonical name,
Pα Q̂α ⊆ M. Finally, |Pα+q | 6 n |α|n · (supi |Q̂i |)n 6 κ.
Lemma: It suffices to show that MAλ holds for partial order with size 6 λ
S uppose P is a partial order with |P | > κ and let hDα iα<λ be dense subsets of P . Define
functions fi : P → Dα for ακ with fα (p) > p (obviously such elements exist since Dα is
285
dense). Let g : P × P → P be a function such that g(p, q) > p, q whenever p and q are
compatible. Then pick some element q ∈ P and let Q be the closure of {q} under fα and g
with the same ordering as P (restricted to Q).
Now we must prove that, if G is a generic subset of Pκ , R some partial order with |R| 6 λ
and hDα iα<λ are dense subsets of R then there is some directed subset of R intersecting each
Dα .
If |R| < λ then λ additional elements can be added greater than any other element of R to
make |R| = λ, and then since there is an order isomorphism into some partial order of λ,
assume R is a partial ordering of λ. Then let D = {hα, βi | α ∈ Dβ }.
Take canonical names so that R = R̂[G], D = D̂[G] and Di = D̂i [G] for each i < λ and:
For any α, β there is a maximal antichain Dα,β ⊆ Pκ such that if p ∈ Dα,β then either
p
Pκ α 6R̂ β or p
Pκ α R̂ β and another maximal antichain Eα,β ⊆ Pκ such that if
p ∈ Eα,β then either p
Pκ hα, βi ∈ D̂ or p
Pκ hα, βi 6∈ D̂. These antichains determine the
value of those two formulas.
286
Since R̂ and D̂ are Pγ names, R̂[G] = R̂[Gγ ] = R and D̂[G] = D̂[Gγ ] = D, so
∗
Let Ap be a maximal antichain of Pγ such that p ∈ Ap , and define 6̂ as a Pγ name with
∗ ∗
(p, m) ∈ 6̂ for each m ∈ R̂ and (a, n) ∈ 6̂ if n = (α, β) where α < β < λ and p 6= a ∈ Ap .
∗ ∗
That is, 6̂ [G] = R when p ∈ G and 6̂ [G] =∈ λ otherwise. Then this is the name for a
∗
partial ordering of λ, and therefore there is some η ∈ Sγ,λ such that 6̂ = 6̂η , and η > γ.
Since p ∈ Gγ ⊆ Gη , Q̂η [Gη ] = 6̂η [Gη ] = R.
The relationship between Martin’s axiom and the continuum hypothesis tells us that 2ℵ0 >
κ. Since 2ℵ0 was less than κ in V , and since |Pκ | = κ adds at most κ elements, it must be
that 2ℵ0 = κ.
This is another, shorter proof for the fact that MAℵ0 always holds.
Let (P, 6) be a partially ordered set and DTbe a collection of subsets of (P, 6). We remeber
that a filter G on (P, 6) is D-generic if G D 6= ∅ for all D ∈ D which are dense in (P, 6).
(”dense” in this context means: If D is dense in (P, 6), than for every p ∈ P there’s a d ∈ D
such that d 6 p.
Let (P, 6) be a partially ordered set and D a countable collection of dense subsets of P , then
there exists a D-generic filter G on P . Moreover, it could be shown, that for every p ∈ P
there’s such a D-generic filter G with p ∈ G.
287
L et D1 , . . . , Dn , . . . be the dense subsets in D. Furthermore let p0 = p. Now we can choose
for every 1 6 n < ω an element pn ∈ P such that pn 6 pn−1 and pn ∈ Dn . If we now consider
the set G := {q ∈ P : ∃ n < ωs.t. pn 6 q}, than it is easy to check that G is a D-generic
filter on P and p ∈ G obviously. This completes the proof.
The Continuum Hypothesis states that there is no cardinal number κ such that ℵ0 < κ < 2ℵ0 .
The continuum hypothesis can also be stated as: there is no subset of the real numbers
which has cardinality strictly between that of the reals and that of the integers. It is from
this that the name comes, since the set of real numbers is also known as the continuum.
34.139 forcing
Forcing is the method used by Paul Cohen to prove the independence of the continuum hypothesis
(CH). In fact, the method was used by Cohen to prove that CH could be violated. The treat-
ment I give here is VERY informal. I will develop it later. First let me give an example from
algebra.
Suppose we have a field k, and we want to add to this field an element α such α2 = −1.
We see that we cannot simply drop a new α in k, since then we are not guaranteed that we
still have a field. Neither can we simply assume that k already has such an element. The
standard way of doing this is to start by adjoining a generic indeterminate X, and impose a
constraint on X, saying that X 2 + 1 = 0. What we do is take the quotient k[X]/(X 2 + 1),
and make a field out of it by taking the quotient field. We then obtain k(α), where α is the
equivalence class of X in the quotient. The general case of this is the theorem of algebra
saying that every polynomial p over a field k has a root in some extension field.
We can rephrase this and say that “it is consistent with standard field theory that −1 have
a square root”.
When the theory we consider is ZFC, we run in exactly the same problem : we can’t just
add a “new” set and pretend it has the required properties, because then we may violate
something else, like foundation. Let M be a transitive model of set theory, which we call
288
the ground model. We want to “add a new set” S to M in such a way that the extension
M0 has M as a subclass, and the properties of M are preserved, and S ∈ M0 .
The first step is to “approximate” the new set using elements of M. This is the analogue of
finding the irreducible polynomial in the algebraic example. The set P of such “approxima-
tions” can be ordered by how much information the approximations give : let p, q ∈ P , then
p 6 q if and only if p “is stronger than” q. We call this set a set of forcing conditions.
Furthermore, it is required that the set P itself and the order relation be elements of M.
Since P is a partial order, some of its subsets have interesting properties. Consider P as
a topological space with the order topology. A subset D ⊆ P is dense in P if and only if
for every p ∈ P , there is d ∈ D such that d 6 p. A filter in P is said to be M-generic if
and only if it intersects every one of the dense subsets of P which are in M. An M-generic
filter in P is also referred to as a generic set of conditions in the literature. In general,
eventhough P is a set in M, generic filters are not elements of M.
If P is a set of forcing conditions, and G is a generic set of conditions in P , all in the ground
model M, then we define M[G] to be the least model of ZFC that contains G. In forthcoming
entries I will detail the construction of M[G]. The big theorem is this :
Theorem 5. M[G] is a model of ZFC, and has the same ordinals as M, and M ⊆ M[G].
The way to prove that we can violate CH using a generic extension is to add many new
“subsets of ω” in the following way : let M be a transitive model of ZFC, and let (P, 6) be
the set (in M) of all functions f whose domain is a finite subset of ℵ2 × ℵ0 , and whose range
is the set {0, 1}. The ordering
S here is p 6 q if and only if p ⊃ q. Let G be a generic set of
conditions in P . Then G is a total function whose domain is ℵ2 × ℵ0 , and range is {0, 1}.
We can see this f as coding ℵ2 new functions fα : ℵ0 → {0, 1}, α < ℵ2 , which are subsets of
omega. These functions are all dictinct, and so CH is violated in M[G].
All this relies on a proper definition of the satisfaction relation in M[G], and the forcing relation,
which will come in a forthcoming entry. Details can be found in Thomas Jech’s book Set
Theory.
The generalized continuum hypothesis states that for any infinite cardinal λ there is no
cardinal κ such that λ < κ < 2λ .
Like the continuum hypothesis, the generalized continuum hypothesis is known to be independent
of the axioms of ZFC.
289
Version: 7 Owner: Evandar Author(s): Evandar
A regular limit cardinal κ is called weakly inaccessible, and a regular strong limit cardinal
is called inaccessible.
34.142 3
To get some sense of what this means, observe that for any λ < κ, {λ} ⊆ κ, so the set of
Aα = {λ} is stationary (in κ). More strongly, suppose κ > λ. Then any subset of T ⊂ λ is
bounded in κ so Aα = T on a stationary set. Since |S| = κ, it follows that 2λ 6 κ. Hence
3ℵ1 , the most common form (often written as just 3), implies CH.
34.143 ♣
Any sequence satisfying 3S can be adjusted so that sup(Aα ) = α, so this is indeed a weakened
form of 3S .
Any such sequence actually contains a stationary set of α such that Aα ⊆ T for each T :
given any club C and any unbounded T , construct a κ sequence, C ∗ and T ∗ , from the
elements of each, such that the α-th member of C ∗ is greater than the α-th member of T ∗ ,
which is in turn greater than any earlier member of C ∗ . Since both sets are unbounded, this
construction is possible, and T ∗ is a subset of T still unbounded in κ. So there is some α
such that Aα ⊆ T ∗ , and since sup(Aα ) = α, α is also the limit of a subsequence of C ∗ and
therefore an element of C.
290
Version: 1 Owner: Henry Author(s): Henry
A Dedekind infinite set is certainly infinite, and if the axiom of choice is assumed, then an
infinite set is Dedekind infinite. However, it is consistent with the failure of the axiom of
choice that there is a set which is infinite but not Dedekind infinite.
Equality of sets: If X and Y are sets, and x ∈ X iff x ∈ Y , then X = Y . Pair set: If X
and Y are sets, then there is a set Z containing only X and Y . Union over a set: If X is a
set, then there exists a set that contains every element of each x ∈ X. axiom of power set:
If X is a set, then there exists a set P(x) with the property that Y ∈ P(x) iff any element
y ∈ Y is also in X. Replacement axiom: Let F (x, y) be some formula. If, for all x, there
is exactly one y such that F (x, y) is true, then for any set A there exists a set B with the
property that b ∈ B iff there exists some a ∈ A such that F (a, b) is true. regularity axiom:
Let F (x) be some formula. If there is some x that makes F (x) true, then there is a set Y
such that F (Y ) is true, but for no y ∈ Y is F (y) true. Existence of an infinite set: There
exists a non-empty set X with the property that, for any x ∈ X, there is some y ∈ X such
that x ⊆ y but x 6= y. Ernst Zermelo and Abraham Fraenkel proposed these axioms as a
foundation for what is now called Zermelo-Fraenkel set theory, or ZF. If these axioms are
accepted along with the axiom of choice, it is often denoted ZFC.
34.146 class
By a class in modern set theory we mean an arbitrary collection of elements of the universe.
All sets are classes (as they are collections of elements of the universe - which are usually
sets, but could also be urelements), but not all classes are sets. Classes which are not sets
are called proper classes.
291
The need for this distinction arises from the paradoxes of the so called naive set theory. In
naive set theory one assumes that to each possible division of the universe into two disjoint
and mutually comprehensive parts there corresponds an entity of the universe, a set. This is
the contents of Frege’s famous fifth axiom, which states that to each second order predicate
P there corresponds a first order object p called the extension of P , s.t. ∀x(P (x) ↔ x ∈ p).
(Every predicate P divides the universe into two mutually comprehensive and disjoint parts;
namely the part which consists of objects for which P holds and the part consisting of objects
for which P does not hold).
Speaking in modern terms we may view the situation as follows. Consider a model of set
theory M. The interpretation the model gives to ∈ defines implicitly a function f : P (M) →
M. Seen this way, the fact that not all classes can be sets simply means that we can’t
injectively map the powerset of any set into the set itself, which is a famous result by
Cantor. Functions like f here are known as extensors and they have been used in the study
of semantics of set theory.
Russell’s paradox - which could be seen as a proof of Cantor’s theorem about cardinalities
of powersets - shows that Frege’s fifth axiom is contradictory; not all classes can be sets.
From here there are two traditional ways to proceed: either trough the theory of types or
trough some form of limitation of size principle.
The limitation of size principle in its vague form says that all small classes (in the sense of
cardinality) are sets, while all proper classes are very big; “too big” to be sets. The limitation
of size principle can be found in Cantor’s work where it is the basis for Cantor’s doctrine that
only transfinite collections can be thought as specific objects (sets), but some collections are
“absolutely infinite”, and can’t be thought to be comprehended into an object. This can be
given a precise formulation: all classes which are of the same cardinality as the universal
class are too big, and all other classes are small. In fact, this formulation can be used
in von Neumann-Bernays-Gödel set theory to replace the replacement axiom and almost all
other set existence axioms (with the exception of the powerset axiom).
The limitation of size principle can be seen to give rise to extensors of type P <|A| (A) → A.
(P <|A|(A) is the set of all subsets of A which are of cardinality less than that of A). This is
not the only possible way to avoid Russell’s paradox. We could use an extensor according
to which all classes which are of cardinality less than that of the universe or for which the
cardinality of their complement is less than that of the universe are sets (i.e. map into
elements of the model).
In many set theories there are formally no proper classes; ZFC is an example of just such a
set theory. In these theories one usually means by a proper class an open formula Φ, possibly
with set parameters a1 , ..., an . Notice, however, that these do not exhaust all possible proper
classes that should “really” exist for the universe, as it only allows us to deal with proper
classes that can be defined by means of an open formula with parameters. The theory NBG
formalises this usage: it’s conservative over ZFC (as clearly speaking about open formulae
with parameters must be!).
292
There is a set theory known as Morse-Kelley set theory which allows us to speak about and
to quantify over an extended class of impredicatively defined porper classes that can’t be
reduced to simply speaking about open formulae.
34.147 complement
If S is a set of finite sets then it is aT∆-system if there is some (possibly empty) X such
that for any a, b ∈ S, if a 6= b then a b = X.
If S is a set of finite sets such that |S| = ℵ1 then there is a S 0 ⊆ S such that |S 0 | = ℵ1 and
S is a ∆-system.
If hST
i ii<α is a sequence then the diagonal intersection, ∆i<α Si is defined to be {β < α |
β ∈ γ<β Sγ }.
293
34.151 intersection
The intersection of two sets A and B is the set that contains T all the elements x such that
x ∈ A and x ∈ B. The intersection of A and B is written as A B.
T
Example. If A = {1, 2, 3, 4, 5} and B = {1, 3, 5, 7, 9} then A B = {1, 3, 5}.
34.152 multiset
Since there are only ℵ0 possible cardinalities for any element of S, there must be some
n such that there are an uncountable number of elements of S with cardinality n. Let
S ∗ = {a ∈ S | |a| = n} for this n. By induction, the lemma holds:
If n = 1 then there each element of S ∗ is distinct, and has no intersection with the others,
so X = ∅ and S 0 = S ∗ .
On the other hand, if there is no suchTx then we can construct a sequence hai ii<ω1 such that
each ai ∈ S ∗ and for any i 6= j, ai aj = S ∅ by induction. Take any element for a0 , and
given hai ii<α , since α is countable, A = i<α ai is countable. Obviously each element of
294
A is in only a countable number of elements of S ∗ , so there are an uncountable number of
elements of S ∗ which are candidates for aα . Then this sequence satisfies the lemma, since
the intersection of any two elements is ∅.
The rational numbers Q are the fraction field of the ring Z of integers. In more elementary
terms, a rational number is a quotient a/b of two integers a and b. Two fractions a/b and
c/d are equivalent if the product of the cross terms is equal:
a c
= ⇔ ad = bc
b d
Addition and multiplication of fractions are given by the formulae
a c ad + bc
+ =
b d bd
a c ac
· =
b d bd
The field of rational numbers is an ordered field, under the ordering relation: a/b 6 c/d if
the inequality a · d 6 b · c holds in the integers.
295
REFERENCES
1. G.M. Bergman, An Invitation to General Algebra and Universal Constructions.
34.157 set
34.157.1 Introduction
Sets can be of “real” objects or mathematical objects; but the sets themselves are purely
conceptual. This is an important point to note: the set of all cows (for example) does not
physically exist, even though the cows do. The set is a “gathering” of the cows into one
conceptual unit that is not part of physical reality. This makes it easy to see why we can
have sets with an infinite number of elements; even though we may not be able to point out
infinity objects in the real world, we can construct conceptual sets which an infinite number
of elements (see the examples below).
Mathematics is thus built upon sets of purely conceptual, or mathematical, objects. Sets
are usually denoted by upper-case roman letters (like S). Sets can be defined by listing the
members, as in
S = {a, b, c, d}
Or, a set can be defined from a formula. This type of statement defining a set is of the form
S = {x : P (x)}
where S is the symbol denoting the set, x is the variable we are introducing to represent a
generic element of the set, and P (x) is some property that is true for values x within S (that
is x ∈ S iff P (x) holds). (We denote “and” by comma separated clauses in P (x). Also note
that the x : portion of the set definition may contain a qualification which narrows values of
x to some other set which is already known).
Sets are, in fact, completely defined by their elements. If two sets have the same elements,
they are equivalent. This is called the axiom of extensionality, and it is one of the most
important characteristics of sets that distinguishes them from predicates or properties.
1
However, not every collection has to be a set (in fact, all collections can’t be sets). See proper class for
more details.
296
The symbol ∈ denotes inclusion in a set. For example,
s∈S
• The set of all isosceles triangles: {4ABC : (AB = BC) 6= AC}, where overline
denotes segment length.
Z, N, and R are all standard sets: the integers, the natural numbers, and the real numbers,
respectively. These are all infinite sets.
The astute reader may have noticed that all of our examples of sets utilize sets, which does
not suffice for rigorous definition. We can be more rigorous if we postulate only the empty
set, and define a set in general as anything which one can construct from the empty set and
the ZFC axioms.
An important set notion is cardinality. Cardinality is roughly the same as the intuitive
notion of “size”. For sets which have a less than infinite (non-infinite) number of elements,
cardinality can be thought of as size. However, intuition breaks down for sets with an infinite
number of elements. For more detail, see the cardinality entry.
Another important set concept is that of subsets. A subset B of a set A is any set which
contains only elements that appear in A. Subsets are denoted with the ⊆ symbol, i.e. B ⊆ A.
Also useful is the notion of a proper subset, denoted B ⊂ A, which adds the restriction
that B must be smaller than A (that is, have a lower cardinality).
297
34.157.3 Set Operations
There are a number of standard (common) operations which are used to manipulate sets,
producing new sets from combinations of existing sets (sometimes with entirely different
types of elements). These standard operations are:
• union
• intersection
• set difference
• complement
• cartesian product
298
Chapter 35
299
Chapter 36
36.1 NJp
NJp is a natural deduction proof system for intuisitionistic propositional logic. Its only
axiom is α ⇒ α for any atomic α. Its rules are:
Γ⇒α Γ ⇒ α Σ, α0 ⇒ φ Π, β 0 ⇒ φ
(∨I) (∨E)
Γ⇒α∨β Γ⇒β∨α cc [Γ, Σ, Π] ⇒ φ
The syntax α0 indicates that the rule also holds if that formula is omitted.
Γ, α ⇒ β Γ⇒α→β Σ⇒α
(→ I) (→ E)
Γ⇒α→β [Γ, Σ] ⇒ β
Γ⇒⊥
where α is atomic(⊥i )
Γ⇒α
36.2 NKp
NKp is a natural deduction proof system for classical propositional logic. It is identical to
NJp except that it replaces the rule ⊥i with the rule:
300
Γ, ¬α ⇒ ⊥
where α is atomic(⊥c )
Γ⇒α
Natural deduction refers to related proof systems for several different kinds of logic, intended
to be similar to the way people actually reason. Unlike many other proof systems, it has
many rules and few axioms. Sequents in natural deduction have only one formula on the
right side.
Typically the rules consist of one pair for each connective, one of which allows the introduc-
tion of that symbol and the other its elimination.
Γ, α ⇒ β
(→ I)
Γ⇒α→β
and
Γ⇒α→β Σ⇒α
(→ E)
[Γ, Σ] ⇒ β
36.4 sequent
A sequent represents a formal step in a proof. Typically it consists of two lists of formulas,
one representing the premises and one the conclusions. A typical sequent might be:
φ, ψ ⇒ α, β
This claims that, from premises φ and ψ either α or β must be true. Note that ⇒ is not
a symbol in the language, rather it is a symbol in the metalanguage used to discuss proofs.
Also, notice the asymmetry: everything on the left must be true to conclude only one thing
on the right. This does create a different kind of symmetry, since adding formulas to either
side results in a weaker sequent, while removing them from either side gives a stronger one.
301
Most proof systems provide ways to deduce one sequent from another. These rules are
written with a list of sequents above and below a line. This rule indicates that if everything
above the line is true, so is everything under the line. A typical rule is:
Γ⇒Σ
Γ, α ⇒ Σ α, Γ ⇒ Σ
This indicates that if we can deduce Σ from Γ, we can also deduce it from Γ together with
α.
Note that the capital greek letters are usually used to denote a (possibly empty) list of
formulas. [Γ, Σ] is used to denote the contraction of Γ and Σ, that is, the list of those
formulas appearing in either Γ or Σ but with no repeats.
If T h and P r are two sets of facts (in particular, a theory of some language and the set of
things provable by some method) we say P r is sound for T h if P r ⊆ T h. Typically we
have a theory and set of rules for constructing proofs, and we say the set of rules are sound
(which theory is intended is usually clear from context) since everything they prove is true
(in T h).
302
Chapter 37
37.1 induction
Induction is the name given to a certain kind of proof, and also to a (related) way of defining
a function. For a proof, the statement to be proved has a suitably ordered set of cases.
Some cases (usually one, but possibly zero or more than one), are proved separately, and
the other cases are deduced from those. The deduction goes by contradiction, as we shall
see. For a function, its domain is suitably ordered. The function is first defined on some
(usually nonempty) subset of its domain, and is then defined at other points x in terms of
its values at points y such that y < x.
1) If n ∈ N, and F (m) is true for all m ∈ N such that m < n, then F (n) is true.
To see why, assume that F (n) is false for some n. Then there is a smallest k ∈ N such that
F (k) is false. Then, by hypothesis, F (n) is true for all n < k. By (1), F (k) is true, which is
a contradiction.
(If we don’t regard induction as a kind of proof by contradiction, then we have to think
of it as supplying some kind of sequence of proofs, of unlimited length. That’s not very
303
satisfactory, particularly for transfinite inductions, which we will get to below.)
n3 n2 n
Bn = + + for all n ∈ N
3 2 6
Let us try to apply (1). We have the inductive hypothesis (as it is called)
m3 m2 m
Bm = + + for all m < n
3 2 6
which tells us something if n > 0. In particular, setting m = n − 1,
(n − 1)3 (n − 1)2 n − 1
Bn−1 = + +
3 2 6
3 2
Now we just add n2 to each side, and verify that the right side becomes n3 + n2 + n6 . This
proves (1) for nonzero n. But if n = 0, the inductive hypothesis is vacuously true, but of no
use. So we need to prove F (0) separately, which in this case is trivial.
Textbooks sometimes distinguish between weak and strong (or complete) inductive proofs.
A proof that relies on the inductive hypothesis (1) is said to go by strong induction. But in
the sum-of-squares formula above, we needed only the hypothesis F (n − 1), not F (m) for all
m < n. For another example, a proof about the Fibonacci sequence might use just F (n − 2)
and F (n − 1). An argument using only F (n − 1) is referred to as weak induction.
Let’s begin with an example, the function N → N, n 7→ an , where a is some integer > 0.
The inductive definition reads
a0 = 1
an = a(an−1 ) for all n > 0
Formally, such a definition requires some justification, which runs roughly as follows. Let T
be the set of m ∈ N for which the following definition ”has no problem”.
a0 = 1
an = a(an−1 ) for 0 < n ≤ m
We now have a finite sequence fm on the interval [0, m], for each m ∈ T . We verify that any
fl and fm have the same values throughout the intersection of their two domains. Thus we
can define a single function on the union of the various domains. Now suppose T 6= N, and
let k be the least element of N − T . That means that the definition has a problem when
304
m = k but not when m < k. We soon get a contradiction, so we deduce T = N. That means
that the union of those domains is all of N, i.e. the function an is defined, unambiguously,
throughout N.
We have been speaking of the inductive definition of a function, rather than just a sequence
(a function on N), because the notions extend with little change to transfinite inductions.
An illustration par excellence of inductive proofs and definitions is Conway’s theory of
surreal numbers. The numbers and their algebraic laws of composition are defined entirely
by inductions which have no special starting cases.
The reader can figure out what is meant by ”induction starting at k”, where k is not neces-
sarily zero. Likewise, the term ”downward induction” is self-explanatory.
An ordered set (S, ≤) is said to be well-ordered if any nonempty subset of S has a least
element. The criterion (1), and its proof, hold without change for any well-ordered set S in
place of N (which is a well-ordered set). But notice that it won’t be enough to prove that
F (n) implies F (n + 1) (where n + 1 denotes the least element > n, if it exists). The reason
is, given an element m, there may exist elements < m but no element k such that m = k + 1.
Then the induction from n to n + 1 will fail to ”reach” m. For more on this topic, look for
”limit ordinals”.
Informally, any variety of induction which works for ordered sets S in which a segment
Sx = {y ∈ S|y < x} may be infinite, is called ”transfinite induction”.
An ordered set S, or its order, is called noetherian if any non-empty subset of S has a
maximal element. Several equivalent definitions are possible, such as the ”ascending chain condition”:
305
any strictly increasing sequence of elements of S is finite. The following result is easily proved
by contradiction.
Principle of Noetherian induction: Let (S, ≤) be a set with a Noetherian order, and let
T be a subset of S having this property: if x ∈ S is such that the condition y > x implies
y ∈ T , then x ∈ T . Then T = S.
So, to prove something ”F (x)” about every element x of a Noetherian set, it is enough to
prove that ”F (z) for all z > y” implies ”F (y)”. This time the induction is going downward,
but of course that is only a matter of notation. The opposite of a Noetherian order, i.e. an
order in which any strictly decreasing sequence is finite, is also in use; it is called a partial
well-order, or an ordered set having no infinite antichain.
The standard example of a Noetherian ordered set is the set of ideals in a Noetherian ring.
But the notion has various other uses, in topology as well as algebra. For a nontrivial
example of a proof by Noetherian induction, look up the Hilbert basis theorem.
An ordered set (S, ≤) is said to be inductive if any totally ordered subset of S has an
upper bound in S. Since the empty set is totally ordered, any inductive ordered set is non-
empty. We have this important result:
Zorn’s lemma is widely used in existence proofs, rather than in proofs of a property F (x) of
an arbitrary element x of an ordered set. Let me sketch one typical application. We claim
that every vector space has a basis. First, we prove that if a free subset F , of a vector space
V , is a maximal free subset (with respect to the order relation ⊂), then it is a basis. Next,
to see that the set of free subsets is inductive, it is enough to verify that the union of any
totally ordered set of free subsets is free, because that union is an upper bound on the totally
ordered set. Last, we apply Zorn’s lemma to conclude that V has a maximal free subset.
306
Chapter 38
• ∀x, y(x + y 0 = (x + y)0 ) (addition is the repeated application of the successor function)
• ∀x(x · 0 = 0)
• ∀x(x0 = 1)
0
• ∀x(xy = xy · x)
307
38.2 PA
Peano Arithmetic (PA) is the restriction of Peano’s axioms to a first order theory of arith-
metic. The only change is that the induction axiom is replaced by induction restricted to
arithmetic formulas:
Note that this replaces the single, second-order, axiom of induction with a countably infinite
schema of axioms.
Appropriate axioms defining +, ·, and < are included. A full list of the axioms of PA looks
like this (although the exact list of axioms varies somewhat from source to source):
• ∀x, y(x + y 0 = (x + y)0 ) (addition is the repeated application of the successor function)
• ∀x(x · 0 = 0)
Peano’s axioms are a definition of the set of natural numbers, denoted N. From these
axioms Peano arithmetic on natural numbers can be derived.
1. 0 ∈ N (0 is a natural number)
308
4. x = y if and only if x0 = y 0.
Peano arithmetic consists of statements derived via these axioms. For instance, from these
axioms we can define addition and multiplication on natural numbers. Addition is defined
as
x + 1 = x0 for all x ∈ N
x + y 0 = (x + y)0 for all x, y ∈ N
Addition defined in this manner can then be proven to be both associative and commutative.
Multiplication is
x · 1 = x for all x ∈ N
x · y 0 = x · y + x for all x, y ∈ N
This definition of multiplication can also be proven to be both associative and commutative,
and it can also be shown to be distributive over addition.
309
Chapter 39
39.1 ACA0
ACA0 is a weakened form of second order arithmetic. Its axioms include the axioms of PA
together with arithmetic comprehension.
39.2 RCA0
RCA0 is a weakened form of second order arithmetic. It consists of the axioms of PA other
than induction, together with Σ01 -IND and ∆01 -CA.
39.3 Z2
Z2 is the full system of second order arithmetic, that is, the full theory of numbers and sets
of numbers. It is sufficient for a great deal of mathematics, including much of number theory
and analysis.
The axioms defining successor, addition, multiplication, and comparison are the same as
those of PA. Z2 adds the full induction axiom and the full comprehension axiom.
310
Version: 1 Owner: Henry Author(s): Henry
The axiom of comprehension (CA) states that every formula defines a set. That is,
∃X∀x(x ∈ X ↔ φ(x))for any formulaφwhereXdoes not occur free inφ
The names specification and separation are sometimes used in place of comprehension, par-
ticularly for weakened forms of the axiom (see below).
In theories which make no distinction between objects and sets (such as ZF), this formulation
leads to Russel’s paradox, however in stratified theories this is not a problem (for example
second order arithmetic includes the axiom of comprehension).
This axiom can be restricted in various ways. One possibility is to restrict it to forming
subsets of sets:
∀Y ∃X∀x(x ∈ X ↔ x ∈ Y ∧ φ(x))for any formulaφwhereXdoes not occur free inφ
Another way is to restrict φ to some family F , giving the axiom F-CA. For instance the
axiom Σ01 -CA is:
∃X∀x(x ∈ X ↔ φ(x))whereφisΣ01 andXdoes not occur free inφ
A third form (usually called separation) uses two formulas, and guarantees only that those
satisfying one are included while those satisfying the other are excluded. The unrestricted
form is the same as unrestricted collection, but, for instance, Σ01 separation:
∀x¬(φ(x) ∧ ψ(x)) → ∃X∀x((φ(x) → x ∈ X) ∧ (ψ(x) → x ∈
/ X))
whereφandψareΣ01 andXdoes not occur free inφorψ
is weaker than Σ01 -CA.
An induction axiom specifies that a theory includes induction, possibly restricted to specific
formulas. IND is the general axiom of induction:
φ(0) ∧ ∀x(φ(x) → φ(x + 1)) → ∀xφ(x) for any formula φ
311
If φ is restricted to some family of formulas F then the axiom is called F-IND, or F induction.
For example the axiom Σ01 -IND is:
312
Chapter 40
A Boolean algebra is a set B with two binary operators, ∧ “meet,”and ∨ “join,” and
one unary operator 0 “complement,” which together are a Boolean lattice. If X and Y are
boolean algebras, a mapping f : X → Y is a morphism of Boolean algebras when it is a
morphism of ∧, ∨, and 0 .
Theorem 3. Given a Boolean algebra B there exists a totally disconnected Hausdorff space
X such that B is isomorphic to the Boolean algebra of clopen subsets of X.
X = {f : B → {0, 1} | f is a homomorphism}
endowed with the subspace topology induced by the product topology on B {0,1} . Then X
is a totally disconnected Hausdorff space. Let Cl(X) denote the Boolean algebra of clopen
subsets of X, then the following map
313
Chapter 41
A Boolean lattice B is a distributive lattice in which for each element x ∈ B there exists
a complement x0 ∈ B such that
x ∧ x0 =0
x ∨ x0 =I
(x0 )0 =x
(x ∧ y)0 = x0 ∨ y 0
(x ∨ y)0 = x0 ∧ y 0
Given a set, any collection of subsets that is closed under unions, intersections, and comple-
ments is a Boolean algebra.
Boolean rings (with identity, but allowing 0=1) are equivalent to Boolean lattices. To view
a Boolean ring as a Boolean lattice, define x ∧ y = xy and x ∨ y = x + y + xy. To view a
Boolean lattice as a Boolean ring, define xy = x ∧ y and x + y = (x0 ∧ y) ∨ (x ∧ y 0).
A complete lattice is a nonempty poset in which every nonempty subset has a supremum
and an infimum.
314
In particular, a complete lattice is a lattice.
41.3 lattice
A lattice is any non-empty poset P in which any two elements x and y have a least upper bound,
x ∨ y, and a greatest lower bound, x ∧ y.
x ∧ x = x, x ∨ x = x (idempotency)
x ∧ y = y ∧ x, x ∨ y = y ∨ x (commutativity)
x ∧ (y ∧ z) = (x ∧ y) ∧ z, (associativity)
x ∨ (y ∨ z) = (x ∨ y) ∨ z
x ∧ (x ∨ y) = x ∨ (x ∧ y) = x (absorption)
x ∧ y = x and x ∨ y = y (consistency)
315
Chapter 42
03G99 – Miscellaneous
A Chu space over a set Σ is a triple (A, r, X) with r : A × X → Σ. A is called the carrier
and X the cocarrier.
Although the definition is symmetrical, in practice asymmetric uses are common. In partic-
ular, often X is just taken to be a set of function from A to Σ, with r(a, x) = x(a) (such a
Chu space is called normal and is abbreviated (A, X)).
Define r̂ and ř to be functions defining the rows and columns of C respectively, so that
r̂(a) : X → Σ and ř(x) : A → Σ are given by r̂(a)(x) = ř(x)(a) = r(a, x). Clearly the rows
of C are the columns of C⊥ .
If C = (A, r, X) and D = (B, s, Y) are Chu spaces then we say a pair of functions f : A → B
and g : Y → X form a Chu transform from C to D if for any (a, y) ∈ A × Y we have
r(a, g(y)) = s(f (a), y).
316
Version: 1 Owner: Henry Author(s): Henry
That is, to name the rows of the biextensional collapse, we just use functions representing
the actual rows of the original Chu space (and similarly for the columns). The effect is to
merge indistinguishable rows and columns.
We say that two Chu spaces are equivalent if their biextensional collapses are isomorphic.
Any set A can be represented as a Chu space over {0, 1} by (A, r, P(A)) with r(a, X) = 1
iff a ∈ X. This Chu space satisfies only the trivial property 2A , signifying the fact that sets
have no internal structure. If A = {a, b, c} then the matrix representation is:
Increasing the structure of a Chu space, that is, adding properties, is equivalent to deleting
columns. For instance we can delete the columns named {c} and {b, c} to turn this into
the partial order satisfying c 6 a. By deleting more columns, we can further increase the
structure. For example, if we require that the set of rows be closed under the bitwise or
operation (and delete those columns which would prevent this) then we can it will define a
semilattice, and if it is closed under both bitwise or and bitwise and then it will define a
lattice. If the rows are also closed under complementation then we have a Boolean algebra.
Note that these are not arbitrary connections: the Chu transforms on each of these classes
of Chu spaces correspond to the appropriate notion of homomorphism for those classes.
For instance, to see that Chu transforms are order preserving on Chu spaces viewed as partial
orders, let C = (A, r, X) be a Chu space satisfying b 6 a. That is, for any x ∈ X we have
r(b, x) = 1 → r(a, x) = 1. Then let (f, g) be a Chu transform to D = (B, s, X), and suppose
s(f (b), y) = 1. Then r(b, g(y)) = 1 by the definition of a Chu transform, and then we have
r(a, g(y)) = 1 and so s(f (a), y) = 1, demonstrating that f (b) 6 f (a).
317
Version: 2 Owner: Henry Author(s): Henry
A property of a Chu space over Σ with carrier A is some Y ⊆ ΣA. We say that a Chu
space C = (A, r, X) satisfies Y if X ⊆ Y .
318
Chapter 43
A simple example.
For any group of 8 integers, there exist at least two of them whose difference is divisible by
7.
C onsider the residue classes modulo 7. These are 0, 1, 2, 3, 4, 5, 6. We have seven classes
and eight integers. So it must be the case that 2 integers fall on the same residue class, and
therefore their difference will be divisible by 7.
Proof. The proof follows from the corresponding rule for the ordinary derivative; if i, k are
319
in 0, 1, 2, . . ., then
k!
di k (k−i)!
xk−i if i ≤ k,
x = (43.2.1)
dxi 0 otherwise.
Suppose i = (i1 , . . . , in ), k = (k1 , . . . , kn ), and x = (x1 , . . . , xn ). Then we have that
∂ |i|
∂ i xk = i1 i
xk11 · · · xknn
∂x1 · · · ∂xn n
∂ i1 k1 ∂ in kn
= x1 · · · · · x .
∂xi11 ∂xinn n
For each r = 1, . . . , n, the function xkr r only depends on xr . In the above, each partial
differentiation ∂/∂xr therefore reduces to the corresponding ordinary differentiation d/dxr .
Hence, from equation 43.2.1, it follows that ∂ i xk vanishes if ir > kr for any r = 1, . . . , n. If
this is not the case, i.e., if i ≤ k as multi-indices, then for each r,
dir kr kr !
xr = xkr −ir ,
i
dxrr (kr − ir )! r
and the theorem follows. 2
Operations on multi-indices
|i| = i1 + · · · + in ,
and the factorial as n
Y
i! = ik !.
k=1
If i = (i1 , . . . , in ) and j = (j1 , . . . , jn ) are two multi-indices, their sum and difference is
defined component-wise as
i + j = (i1 + j1 , . . . , in + jn ),
i − j = (i1 − j1 , . . . , in − jn ).
320
Thus |i ± j| = |i| ± |j|. Also, if jk ≤ ik for all k = 1, . . . , n, then we write j ≤ i. For
multi-indices i, j, with j ≤ i, we define
i i!
= .
j (i − j)!j!
For a point x = (x1 , . . . , xn ) in Rn (with standard coordinates) we define
n
Y
i
x = xikk .
k=1
Much of the motivation for the above notation is that standard results such as Leibniz’ rule,
Taylor’s formula, etc can be written more or less as-is in many dimensions by replacing
indices in N with multi-indices. Below are some examples of this.
Examples
REFERENCES
1. http://www.math.umn.edu/ jodeit/course/TmprDist1.pdf
2. M. Reed, B. Simon, Methods of Mathematical Physics, I - Functional Analysis, Aca-
demic Press, 1980.
3. E. Weisstein, Eric W. Weisstein’s world of mathematics, entry on Multi-Index Notation
321
Chapter 44
The Catalan numbers, or Catalan sequence, have many interesting applications in com-
binatorics.
where nr represents the binomial coefficient. The first several Catalan numbers are 1, 1, 2,
5, 14, 42, 132, 429, 1430, 4862 ,. . . (see EIS sequence A000108 for more terms). The Catalan
numbers are also generated by the recurrence relation
n−1
X
C0 = 1, Cn = Ci Cn−1−i .
i=0
322
1. The number of ways to arrange n pairs of matching parentheses, e.g.:
()
(()) ()()
((())) (()()) ()(()) (())() ()()()
The Catalan sequence is named for Eugène Charles Catalan, but it was discovered in 1751
by Euler when he was trying to solve the problem of subdividing polygons into triangles.
REFERENCES
1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. Addison-
Wesley, 1998. Zbl 0836.00001.
The Levi-Civita permutation symbol is a special case of the generalized Kronecker delta symbol.
Using this fact one can write the Levi-Civita permutation symbol as the determinant of an
n × n matrix consisting of traditional delta symbols. See the entry on the generalized Kro-
necker symbol for details.
When using the Levi-Civita permutation symbol and the generalized Kronecker delta symbol,
the Einstein summation convention is usually employed. In the below, we shall also use this
convention.
properties
323
• When n = 2, we have for all i, j, m, n in {1, 2},
Let us prove these properties. The proofs are instructional since they demonstrate typical
argumentation methods for manipulating the permutation symbols.
Proof. For equation 220.5.1, let us first note that both sides are antisymmetric with respect
of ij and mn. We therefore only need to consider the case i 6= j and m 6= n. By substitution,
we see that the equation holds for ε12 ε12 , i.e., for i = m = 1 and j = n = 2. (Both sides are
then one). Since the equation is anti-symmetric in ij and mn, any set of values for these
can be reduced the the above case (which holds). The equation thus holds for all values of
ij and mn. Using equation 220.5.1, we have for equation 44.2.2
Here we used the Einstein summation convention with i going from 1 to 2. Equation 44.2.3
follows similarly from equation 44.2.2. To establish equation 44.2.4, let us first observe that
both sides vanish when i 6= j. Indeed, if i 6= j, then one can not choose m and n such
that both permutation symbols on the left are nonzero. Then, with i = j fixed, there are
only two ways to choose m and n from the remaining two indices. For any such indices,
we have εjmn εimn = (εimn )2 = 1 (no summation), and the result follows. The last property
follows since 3! = 6 and for any distinct indices i, j, k in {1, 2, 3}, we have εijk εijk = 1 (no
summation). 2
Examples.
(A × B)i = εijk Aj B k .
324
For instance, the first component of A × B is A2 B 3 − A3 B 2 . From the above expression
for the cross product, it is clear that A × B = −B × A. Further, if C = (C 1 , C 2 , C 3 )
is a vector like A and B, then the triple scalar product equals
A · (B × C) = εijk Ai B j C k .
From this expression, it can be seen that the triple scalar product is antisymmetric
when exchanging any adjacent arguments. For example, A · (B × C) = −B · (A × C).
∂ k
(∇ × F )i (x) = εijk F (x).
∂xj
The left-hand side counts the set of strings of n bits with r 1s. Suppose we take one of these
strings and remove the first bit b. There are two cases: either b = 1, or b = 0.
If b = 1, then the new string is n − 1 bits with r − 1 ones; there are n−1
r−1
bit strings of this
nature.
If b = 0, then the new string is n − 1 bits with r ones, and there are n−1 r
strings of this
nature.
Therefore every string counted on the left is covered by one, but not both, of these two cases.
If we add the two cases, we find that
n n−1 n−1
= +
r r−1 r
325
44.4 Pascal’s rule proof
We need to show
n n n+1
+ =
k k−1 k
n! n! (n − k + 1)n! kn!
+ = +
k!(n − k)! (k − 1)!(n − k + 1)! (n − k + 1)k!(n − k)! k(k − 1)!(n − k + 1)!
(n − k + 1)n! + kn!
=
k!(n − k + 1)!
(n + 1)n!
=
k!((n + 1) − k)!
(n + 1)!
=
k!((n + 1) − k)!
n+1
=
k
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
.. .. ..
. . .
326
This triangle goes on into infinity. Therefore we have only printed the first 8 lines. In general,
this triangle is constructed such that entries on the left side and right side are 1, and every
entry inside the triangle is obtained by summing the two entries immediately above it. For
instance, on the forth row 4 = 1 + 3.
Historically, the application of this triangle has been to give the coefficients when expanding
binomial expressions. For instance, to expand (a + b)4 , one simply look up the coefficients
on the fourth row, and write
Pascal’s triangle is named after the French mathematician Blaise Pascal (1623-1662) [3].
However, this triangle was known at least around 1100 AD in China; five centuries before Pas-
cal [1]. In modern language, the expansion of the binomial is given by the binomial theorem
discovered by Isaac Newton in 1665 [2]: For any n = 1, 2, . . . and real numbers a, b, we have
Xn
n n n−k k
(a + b) = a b
k
k=0
n n n−1 n n−2 2
= a + a b+ a b + · · · bn .
1 2
Thus, in Pascal’s triangle, the entries on the nth row are given by the binomial coefficients
n n!
= .
k (n − k)!k!
for k = 1, . . . , n.
REFERENCES
1. Wikipedia’s entry on the binomial coefficients
2. Wikipedia’s entry on Isaac Newton
3. Wikipedia’s entry on Blaise Pascal
327
44.6 Upper and lower bounds to binomial coefficient
n nk
≤
k k!
n n · e k
≤
k k
n n k
≥
k k
n
nk
Also, for large n: k
≈ k!
.
The number of ways to choose r objects from a set with n elements (n ≥ r) is given by
n!
.
(n − r)!r!
These numbers are called binomial coefficients, because they show up at expanding (x + y)n .
n
• r
is the coefficient of xk y n−k in (x + y)n . (binomial theorem).
n
n
• r
= n−r .
n
n
n+1
• r−1
+ r
= r
(Pascal’s rule).
n
• 0
= 1 = nn for all n.
n
• 0
+ n1 + n2 + · · · + nn = 2n .
n
• 0
− n1 + n2 − · · · + (−1)n nn = 0.
Pn t n+1
• t=1 k = k+1 .
328
On the context of Computer Science, it also helps to see nr as the number of strings
consisting of ones and zeros with r ones and n − r zeros. This equivalency comes from the
fact that if S be a finite set with n elements, nr is the number of distinct subsets of S with
r elements. For each subset T of S, consider the function
XT : S → {0, 1}
where XT (x) = 1 whenever x ∈ T and 0 otherwise (so XT is the characteristic function for
T ). For each T ∈ P(S), XT can be used to produce a unique bit string of length n with
exactly r ones.
n!! = n(n − 2) · · · kn
For example,
7!! = 7 · 5 · 3 · 1 = 105
10!! = 10 · 8 · 6 · 4 · 2 = 3840
44.9 factorial
For any non-negative integer n, the factorial of n, denoted n!, can be defined by
n
Y
n! = r
r=1
Alternatively, the factorial can be defined recursively by 0! = 1 and n! = n(n − 1)! for n > 0.
n! is equal to the number of permutations of n distinct objects. For example, there are 5!
ways to arrange the five letters A, B, C, D and E into a word.
329
Euler’s gamma function Γ(x) generalizes the notion of factorial to almost all complex values,
as
Γ(n + 1) = n!
for every non-negative integer n.
For n ∈ N, the rising and falling factorials are nth degree polynomial described, respectively,
by
xn = x(x + 1) . . . (x + n − 1)
xn = x(x − 1) . . . (x − n + 1)
xn = (−1)n (−x)n .
The rising factorial is often written as (x)n , and referred to as the Pochhammer symbol (see
hypergeometric series). Unfortunately, the falling factorial is also often denoted by (x)n , so
great care must be taken when encountering this notation.
Notes.
Unfortunately, the notational conventions for the rising and falling factorials lack a common
standard, and are plagued with a fundamental inconsistency. An examination of reference
works and textbooks reveals two fundamental sources of notation: works in combinatorics
and works dealing with hypergeometric functions.
Works of combinatorics [1,2,3] give greater focus to the falling factorial because of if its role
in defining the Stirling numbers. The symbol (x)n almost always denotes the falling factorial.
The notation for the rising factorial varies widely; we find hxin in [1] and (x)(n) in [3].
Works focusing on special functions [4,5] universally use (x)n to denote the rising factorial and
use this symbol in the description of the various flavours of hypergeometric series. Watson [5]
credits this notation to Pochhammer [6], and indeed the special functions literature eschews
“falling factorial” in favour of “Pochhammer symbol”. Curiously, according to Knuth [7],
Pochhammer himself used (x)n to denote the binomial coefficient (Note: I haven’t verified
this.)
The notation featured in this entry is due to D. Knuth [7,8]. Given the fundamental in-
consistency in the existing notations, it seems sensible to break with both traditions, and
to adopt new and graphically suggestive notation for these two concepts. The traditional
330
notation, especially in the hypergeometric camp, is so deeply entrenched that, realistically,
one needs to be familiar with the traditional modes and to take care when encountering the
symbol (x)n .
References
When n = 1,
X1
1 1 1−k k 1 1 0 1 0 1
(a + b) = a b = ab + a b = a + b.
k=0
k 0 1
331
(a + b)m+1 = a(a + b)m + b(a + b)m
Xm Xm
m m−k k m m−j j
= a a b +b a b by the inductive hypothesis
k=0
k j=0
j
Xm m
m m−k+1 k X m m−j j+1
= a b + a b by multiplying through by a and b
k=0
k j=0
j
X m m
m+1 m m−k+1 k X m m−j j+1
= a + a b + a b by pulling out the k = 0 term
k=1
k j=0
j
X m m+1
m+1 m m−k+1 k X m
= a + a b + am−k+1 bk by letting j = k − 1
k=1
k k=1
k − 1
X m X m
m+1 m m−k+1 k m
= a + a b + am+1−k bk + bm+1 by pulling out thek = m + 1 te
k=1
k k=1
k − 1
m
X m
m+1 m+1 m
= a +b + + am+1−k bk by combining the sums
k k−1
k=1
Xm
m + 1 m+1−k k
= am+1 + bm+1 + a b from Pascal’s rule
k=1
k
m+1
X m+1
= am+1−k bk by adding in the m + 1 terms,
k=0
k
as desired.
a1 x1 + a2 x2 + . . . + ak xk .
The multinomial theorem provides the general form of the expansion of the powers of this
expression, in the process specifying the multinomial coefficients which are found in that
expansion. The expansion is:
X n!
(x1 + x2 + . . . + xk )n = xn1 xn2 · · · xnk k (44.12.1)
n1 !n2 ! · · · nk ! 1 2
where the sum is taken over all multi-indices (n1 , . . . nk ) ∈ Nk that sum to n.
332
n!
The expression n1 !n2 !···nk !
occurring in the expansion is called multinomial coefficient and
is denoted by
n
.
n1 , n2 , . . . , nk
Proof. The below proof of the multinomial theorem uses the binomial theorem and induction
on k. In addition, we shall use multi-index notation.
First, for k = 1, both sides equal xn1 . For the induction step, suppose the multinomial
theorem holds for k. Then the binomial theorem and the induction assumption yield
Xn
n n
(x1 + · · · + xk + xk+1 ) = (x1 + · · · + xk )l xk+1
n−l
l=0
l
Xn
n X xi n−l
= l! xk+1
l=0
l i!
|i|=l
n X
X n−l
xi xk+1
= n!
l=0 |i|=l
i!(n − l)!
where x = (x1 , . . . , xk ) and i is a multi-index in I+k . To complete the proof, we need to show
that the sets
|(i1 , . . . , ik , n − l)| = l + n − l = n.
For B ⊂ A, suppose j = (j1 , . . . , jk+1 ) ∈ I+k+1 , and |j| = n. Let l = |(j1 , . . . , jk )|. Then
l = n − jk+1 , so jk+1 = n − l for some l = 0, . . . , n. It follows that that A = B.
Let us define y = (x1 , · · · , xk+1 ) and let j = (j1 , . . . , jk+1 ) be a multi-index in I+k+1 . Then
X j
n x(j1 ,...,jk ) xk+1
k+1
(x1 + · · · + xk+1 ) = n!
(j1 , . . . , jk )!jk+1 !
|j|=n
X yj
= n! .
j!
|j|=n
333
This completes the proof. 2
Y
n−1
1
i n−1
Y (i + 1)i
1+ =
i=1
i i=1
ii
Qn
ii−1
= n−1 i=2
Q i−1
i · (n − 1)!
i=1
nn
Since each left-hand factor in (44.14.1) is < e, we have n!
< en−1 . Since n − i < n ∀ 1 ≤ i ≤
k − 1, we immediately get
Q
k−1
1
1−
n i=2
i nn
= < .
k k! k!
334
And from
k ≤ n ⇔ (n − i) · k ≥ (k − i) · n ∀ 1 ≤ i ≤ k − 1
we obtain
k−1
n n Y n − i)
= ·
k k i=1 k − i
n k
≥ k
.
335
Chapter 45
n\k 1 2 3 4 5
1 1
2 -1 1
3 2 -3 1
4 -6 11 -6 1
5 24 -50 35 -10 1
336
subject to the following initial conditions:
s(n, 0) = 0, s(1, 1) = 1.
Generating Function. There is also a strong connection with the generalized binomial formula,
which furnishes us with the following generating function:
∞ X
X n
tn
(1 + t)x = s(n, k)xk .
n=0 k=1
n!
This generating function implies a number of identities. Taking the derivative of both sides
with respect to t and equating powers, leads to the recurrence relation described above.
Taking the derivative of both sides with respect to x gives
Xn
n−j n+1
(k + 1)s(n, k + 1) = (−1) (n − j)! s(j, k)
j=k
j
This is because the derivative of the left side of the generating funcion equation with respect
to x is ∞
X tk
x x
(1 + t) ln(1 + t) = (1 + t) (−1)k−1 .
k=1
k
The relation
(1 + t)x1 (1 + t)x2 = (1 + t)x1 +x2
yields the following family of summation identities. For any given k1 , k2 , d > 1 we have
X d + k1 + k2
k1 + k2
s(d + k1 + k2 , k1 + k2 ) = s(d1 + k1 , k1 )s(d2 + k2 , k2 ).
k1 d +d =d
k1 + d1
1 2
Enumerative interpretation. The absolute value of the Stirling number of the first kind,
s(n, k), counts the number of permutations of n objects with exactly k orbits (equiva-
lently, with exactly k cycles). For example, s(4, 2) = 11, corresponds to the fact that
the symmetric group on 4 objects has 3 permutations of the form
(∗∗)(∗∗) — 2 orbits of size 2 each,
and 8 permutations of the form
(∗ ∗ ∗) — 1 orbit of size 3, and 1 orbit of size 1,
(see the entry on cycle notation for the meaning of the above expressions.)
Let us prove this. First, we can remark that the unsigned Stirling numbers of the first are
characterized by the following recurrence relation:
|s(n + 1, k)| = |s(n, k − 1)| + n|s(n, k)|, 1 6 k < n.
337
To see why the above recurrence relation matches the count of permutations with k cycles,
consider forming a permutation of n + 1 objects from a permutation of n objects by adding
a distinguished object. There are exactly two ways in which this can be accomplished. We
could do this by forming a singleton cycle, i.e. leaving the extra object alone. This accounts
for the s(n, k − 1) term in the recurrence formula. We could also insert the new object into
one of the existing cycles. Consider an arbitrary permutation of n object with k cycles, and
label the objects a1 , . . . , an , so that the permutation is represented by
(a1 . . . aj1 )(aj1 +1 . . . aj2 ) . . . (ajk−1 +1 . . . an ) .
| {z }
k cycles
To form a new permutation of n + 1 objects and k cycles one must insert the new object into
this array. There are, evidently n ways to perform this insertion. This explains the n s(n, k)
term of the recurrence relation. Q.E.D.
The Stirling number S(n, k) is the number of way to partition a set of n objects
into k groups.
For example, S(4, 2) = 7 because there are seven ways to partition 4 objects — call them a,
b, c, d — into two groups, namely:
(a)(bcd), (b)(acd), (c)(abd), (d)(abc), (ab)(cd), (ac)(bd), (ad)(bc)
• a recurrence relation
• a generating function related to the falling factorial
• differential operators
• a double-index generating function
338
A recurrence relation. The Stirling numbers of the second kind can be characterized in
terms of the following recurrence relation:
S(n, k) = kS(n − 1, k) + S(n − 1, k − 1), 1 6 k < n,
Let us now show that the recurrence formula follows from the enumerative definition. Evidently,
there is only one way to partition n objects into 1 group (everything is in that group), and
only one way to partition n objects into n groups (every object is a group all by itself).
Proceeding recursively, a division of n objects a1 , . . . , an−1 , an into k groups can be achieved
by only one of two basic maneuvers:
• We could partition the first n − 1 objects into k groups, and then add object an into
one of those groups. There are kS(n − 1, k) ways to do this.
• We could partition the first n − 1 objects into k − 1 groups and then add object an as
a new, 1 element group. This gives an additional S(n − 1, k − 1) ways to create the
desired partition.
The recursive point of view, therefore explains the connection between the recurrence for-
mula, and the original definition.
Using the recurrence formula we can easily obtain a table of the initial Stirling numbers:
n\k 1 2 3 4 5
1 1
2 1 1
3 1 3 1
4 1 7 6 1
5 1 15 25 10 1
is also a basis, and hence can be used to generate the monomial basis. Indeed, the Stirling
numbers of the second kind can be characterized as the the coefficients involved in the
corresponding change of basis matrix, i.e.
n
X
n
x = S(n, k)xk .
k=1
339
So, for example,
Arguing inductively, let us prove that this characterization follows from the recurrence rela-
tion. Evidently the formula is true for n = 1. Suppose then that the formula is true for a
given n. We have
xxk = xk+1 + kxk ,
and hence using the recurrence relation we deduce that
n
X
n+1
x = S(n, k) x xk
k=1
n
X
= kS(n, k)xk + S(n, k + 1) xk+1
k=1
n+1
X
= S(n, k)xk
k=1
where an exponentiated differential operator denotes the operator composed with itself the
indicated number of times. Let us show that this follows from the recurrence relation. The
proof is once again, inductive. Suppose that the characterization is true for a given n. We
have
Tx (xk (Dx )k ) = kxk (Dx )k + xk+1 (Dx )k+1 ,
and hence using the recurrence relation we deduce that
n
!
X
(Tx )n+1 = xDx S(n, k) xk (Dx )k
k=1
n
X
= S(n, k) kxk (Dx )k + xk+1 (Dx )k+1
k=1
n+1
X
= S(n, k) xk (Dx )k
k=1
340
Double index generating function. One can also characterize the Stirling numbers of
the second kind in terms of the following generating function:
∞ X
X n
x(et −1) tn
e = S(n, k) xk .
n=1 k=1
n!
and that is exactly equal to Dt of the left-hand side. Since this relation holds for all polyno-
mials, it also holds for all formal power series. In particular if we apply the above relation
to eξ , use the result of the preceding section, and note that
Dξ [eξ ] = eξ ,
we obtain
∞ n
X t
xet n ξ
e = (Tξ ) [e ]
n=1
n! ξ=x
X∞ Xn
tn k
= S(n, k) ξ (Dξ )k [eξ ]
n=1 k=1
n! ξ=x
∞ X
X n
x tn
=e S(n, k) xk
n=1 k=1
n!
341
Chapter 46
342
Chapter 47
05A99 – Miscellaneous
Let C = {A1 , A2 , . . . AN } be a finite collection of finite sets. Let Ik represent the set of k-fold
intersections of members of C (e.g., I2 contains all possible intersections of two sets chosen
from C).
Then
[N X N X
(−1)(j+1)
Ai = |S|
i=1 j=0 S∈Ij
For example: [ \
|A
B| = (|A| + |B|) − (|A B|)
[ [ \ \ \ \ \
|A B C| = (|A| + |B| + |C|) − (|A B| + |A C| + |B C|) + (|A B C|)
\N [ N
Ai = Ai
i=1 i=1
thereby turning the problem of finding an intersection into the problem of finding a union.
343
47.2 principle of inclusion-exclusion proof
The proof is by induction. Consider a single set A1 . Then the principle of inclusion-exclusion
states that |A1 | = |A1 |, which is trivially true.
Furthermore, the three sets on the right-hand side of that equation must be disjoint. There-
fore, by the addition principle, we have
[ \
|A B| = |A \ B| + |B \ A| + |A B|
\ \ \
= |A \ B| + |A B| + |B \ A| + |A B| − |A B|
\
= |A| + |B| − |A B|
Now consider a collection of N > 2 finite sets A1 , A2 , . . . AN . We assume that the principle
of inclusion-exclusion holds for any collection of M sets where 1 6 M < N. Because the
union of sets is associative, we may break up the union of all sets in the collection into a
union of two sets: !
N
[ N
[ −1 [
Ai = Ai AN
i=1 i=1
Now, let Ik be the collection of all k-fold intersections of A1 , A2 , . . . AN −1 , and let Ik0 be
the collection of all k-fold intersections of A1 , A2 , . . . AN that include AN . Note that AN is
included in every member of Ik0 and in no member of Ik , so the two sets do not duplicate
one another.
We then have
!
[N X N X N
[ −1 \
(−1)(j+1)
Ai = |S| + |AN | − Ai AN
i=1 j=1 S∈Ij i=1
344
by the principle of inclusion-exclusion for a collection of N − 1 sets. Then, we may distribute
set intersection over set union to find that
[N X N X N[
−1 \
(j+1)
A i = (−1) |S| + |A N | − (A i A )
N
i=1 j=1 S∈Ij i=1
Henc we may again apply the principle of inclusion-exclusion for N − 1 sets, revealing that
[ N N
X −1 X N
X −1 X \
(−1)(j+1)
Ai = |S| + |AN | − (−1)(j+1) |S AN |
i=1 j=1 S∈Ij j=1 S∈Ij
N
X −1 X N
X −1 X
= (−1)(j+1) |S| + |AN | − (−1)(j+1) |S|
j=1 S∈Ij j=1 0
S∈Ij+1
N
X −1 X N
X X
= (−1)(j+1) |S| + |AN | − (−1)(j) |S|
j=1 S∈Ij j=2 S∈Ij0
N
X −1 X N
X X
= (−1)(j+1) |S| + |AN | + (−1)(j+1) |S|
j=1 S∈Ij j=2 S∈Ij0
The second sum does not include I10 . Note, however, that I10 = {AN }, so we have
[N N
X −1 X XN X
(−1)(j+1)
Ai = |S| + (−1)(j+1) |S|
0
i=1 j=1 S∈Ij j=1 S∈Ij
N
X −1 X X
= (−1)(j+1) |S| + |S|
j=1 S∈Ij S∈Ij0
Combining the two sums yields the principle of inclusion-exclusion for N sets.
345
Chapter 48
It is easily shown that the multiplication table (Cayley-table) of a group has exactly these
properties and thus are latin squares. The converse, however, is (unfortunately) not true, ie.
not all Latin squares are multiplication tables for a group (the smallest counter example is
a Latin square of order 5).
Let A = (aij ) and B = (bij ) be two n × n matrices. We define their join as the matrix whose
(i, j)th entry is the pair (aij , bij ).
The name comes from Euler’s use of greek and latin letters to differentiate the entries on
each array.
346
Version: 1 Owner: drini Author(s): drini
A latin square of order n is an n × n array such that each column and each row are made
with the same n symbols, using every one exactly once time.
Examples.
a b c d 1 2 3 4
c d a b 4 3 2 1
d c b a 2 1 4 3
b a d c 3 4 1 2
A magic square of order n is an n × n array using each one of the numbers 1, 2, 3, . . . , n2 once
and such that the sum of the numbers in each row, column or main diagonal is the same.
Example:
8 1 6
3 5 7
4 9 2
It’s easy to prove that the sum is always 21 n(n2 + 1). So in the example with n = 3 the sum
is always 12 (3 × 10) = 15.
347
Chapter 49
49.1 matroid
A matroid permits several equivalent formal definitions: two definitions in terms of a rank
function, one in terms of independant subsets, and several more.
For a finite set X, β(X) will denote the set of all subsets of X, and |X| will denote the
number of elements of X. E is a fixed finite set throughout.
r is called the rank function of the matroid. (r3) is called the submodular inequality.
The notion of isomorphism between one matroid (E, r) and another (F, s) has the expected
meaning: there exists a bijection f : E → F which preserves rank, i.e. satisfies s(f (A)) =
348
r(A) for all A ⊂ E.
q1) r(∅) = 0.
S
q2) If x ∈ E and S ⊂ E then r(S {x}) − r(S) ∈ {0, 1}.
S S S
q3) If x, y ∈ E and S ⊂ E and r(S {x}) = r(S {y}) = r(S) then r(S {x, y}) = r(S).
Definition 3: A matroid is a pair (E, I) where I is a subset of β(E) satisfying these axioms:
i1) ∅ ∈ I.
i3) If S, T ∈ I and S, T ⊂ U ⊂ E and S and T are both maximal subsets of U with the
property that they are in I, then |S| = |T |.
An element of I is called an independent set. (E, I) is called normal if any singleton subset
of E is independant, i.e.
b1) B 6= ∅.
φ is called the span mapping of the matroid, and φ(A) is called the span of the subset A.
349
(E, φ) is called normal if also
φ*) φ(∅) = ∅
c1) ∅ ∈
/ C.
It would take several pages to spell out what is a circuit in terms of rank, and likewise for
each other possible pair of the alternative defining notions, and then to prove that the various
sets of axioms unambiguously define the same structure. So let me sketch just one example:
the equivalence of Definitions 1 (on rank) and 6 (on circuits). Assume first the conditions in
Definition 1. Define a circuit as a minimal subset A of β(E) having the property r(A) < |A|.
With a little effort, we verify the axioms (c1)-(c3). Now assume (c1)-(c3), and let r(A) be
the largest integer n such that A has a subset B for which
– B contains no element of C
– n = |B|.
One now proves (r1)-(r3). Next, one shows that if we define C in terms of r, and then
another rank function s in terms of C, we end up with s=r. The equivalence of (r*) and
(c*) is easy enough as well.
Let V be a vector space over a field k, and let E be a finite subset of V . For S ⊂ E, let r(S)
be the dimension of the subspace of V generated by S. Then (E, r) is a matroid. Such a
matroid, or one isomorphic to it, is said to be representable over k. The matroid is normal
iff 0 ∈
/ E. There exist matroids which are not representable over any field.
The second example of a matroid comes from graph theory. The following definition will be
rather informal, partly because the terminology of graph theory is not very well standardised.
350
For our present purpose, a graph consists of a finite set V , whose elements are called vertices,
plus a set E of two-element subsets of V , called edges. A circuit in the graph is a finite set
of at least three edges which can be arranged in a cycle:
such that the vertices a, b, . . . z are distinct. With circuits thus defined, E satisfies the axioms
in Definition 6, and is thus a matroid, and in fact a normal matroid. (The definition is easily
adjusted to permit graphs with loops, which define non-normal matroids.) Such a matroid,
or one isomorphic to it, is called “graphic”.
S
Let E = A B be a finite set, where A and B are nonempty and disjoint. Let G a subset of
A × B. We get a “matching” matroid on E as follows. Each element of E defines a “line”
which is a subset (a row or column) of the set A × B. Let us call the elements of G “points”.
For any S ⊂ E let r(S) be the largest number n such that for some set of points P :
– |P | = n
One can prove (it is not trivial) that r is the rank function of a matroid on E. That
matroid is normal iff every line contains at least one point. matching matroids participate in
combinatorics, in connection with results on “transversals”, such as Hall’s marriage theorem.
Proposition: Let E be a matroid and r its rank function. Define a mapping s : β(E) → N
by
s(A) = |A| − r(E) + r(E − A).
Then the pair (E, s) is a matroid (called the dual of (E, r).
We leave the proof as an exercise. Also, it is easy to verify that the dual of the dual is the
original matroid. A circuit in (E, s) is also referred to as a cocircuit in (E, r). There is a
notion of cobasis also, and cospan.
If the dual of E is graphic, E is called cographic. This notion of duality agrees with the
notion of same name in the theory of planar graphs (and likewise in linear algebra): given
a plane graph, the dual of its matroid is the matroid of the dual graph. A matroid that is
both graphic and cographic is called planar, and various criteria for planarity of a graph can
be extended to matroids. The notion of orientability can also be extended from graphs to
matroids.
351
49.1.4 Binary matroids
A matroid is said to be binary if it is representable over the field of two elements. There are
several other (equivalent) characterisations of a binary matroid (E, r), such as:
– The symmetric – difference of any family of – circuits is the union – of a family of pairwise
disjoint circuits.
T
– For any circuit C and cocircuit D, we have |C D| ⇔ 0 (mod 2).
49.1.5 Miscellaneous
extends without change to any matroid. This polynomial has something to say about the
decomposibility of matroids into simpler ones.
Also on the topic of decomposibility, matroids have a sort of structure theory, in terms of
what are called minors and separators. That theory, due to Tutte, goes by induction; roughly
speaking, it is an adaptation of the old algorithms for putting a matrix into a canonical form.
Along the same lines are several theorems on “basis exchange”, such as the following. Let
E be a matroid and let
A = {a1 , . . . , an }
B = {b1 , . . . , bn }
be two (equipotent) bases of E. There exists a permutation ψ of the set {1, . . . , n} such
that, for every m from 0 to n,
is a basis of E.
James G. Oxley, Matroid Theory, Oxford University Press, New York etc., 1992
352
The chromatic polynomial is not discussed in Oxley, but see e.g. Zaslavski.
49.2 polymatroid
The polymatroid defined by a given matroid (E, r) is the set of all functions w : E → R
such that
w(e) ≥ 0 for all e ∈ E
X
w(e) ≤ r(S) for all S ⊂ E .
e∈S
Polymatroids are related to the convex polytopes seen in linear programming, and have
similar uses.
353
Chapter 50
05C05 – Trees
An AVL tree is A balanced binary search tree where the height of the two subtrees (children)
of a node differs by at most one. Look-up, insertion, and deletion are O(ln n), where n is
the number of nodes in the tree.
The structure is named for the inventors, Adelson-Velskii and Landis (1962).
A κ-tree T for which |Tα | < κ for all α < κ and which has no cofinal branches is called a
κ-Aronszajn tree. If κ = ω1 then it is referred to simply as an Aronszajn tree.
If there are no κ-Aronszajn trees for some κ then we say κ has the tree property. ω has
the tree property, but no singular cardinal has the tree property.
354
50.4 antichain
A subset A of a poset (P, <P ) is an antichain if no two elements are comparable. That is,
if a, b ∈ A then a ≮P b and b ≮P a.
In particular, if (P, <P ) is a tree then the maximal antichains are exactly those antichains
which intersect every branch, and if the tree is splitting then every level is a maximal
antichain.
A balanced tree is a rooted tree where each subtree of the root has an equal number of
nodes (or as near as possible). For an example, see binary tree.
A binary tree is a rooted tree where every node has two or fewer children. A balanced
binary tree is a binary tree that is also a balanced tree. For example,
B E
C D F G
The two (potential) children of a node in a binary tree are often called the left and right
children of that node. The left child of some node X and all that child’s descendents are the
left descendents of X. A similar definition applies to X’s right descendents. The left
subtree of X is X’s left descendents, and the right subtree of X is its right descendents.
Since we know the maximum number of children a binary tree node can have, we can make
some statements regarding minimum and maximum depth of a binary tree as it relates to
355
the total number of nodes. The maximum depth of a binary tree of n nodes is n − 1 (every
non-leaf node has exactly one child). The minimum depth of a binary tree of n nodes (n > 0)
is dlog2 ne (every non-leaf node has exactly two children, that is, the tree is balanced).
A B E C D F G
Many data structures are binary trees. For instance, heaps and binary search trees are binary
trees with particular properties.
50.7 branch
This is the same as the intuitive conception of a branch: it is a set of nodes starting at the
root and going all the way to the tip (in infinite sets the conception is more complicated,
since there may not be a tip, but the idea is the same). Since branches are maximal there is
no way to add an element to a branch and have it remain a branch.
A child node C of a node P in a tree is any node connected to P which has a path distance
from the root node R which is one greater than the path distance between P and R.
356
Drawn in the canonical root-at-top manner, a child node of a node P in a tree is simply any
node immediately below P which is connected to it.
• •
• •
A complete binary tree is a binary tree with the additional property that every node
must have exactly two “children” if an internal node, and zero children if a leaf node.
More precisely: for our base case, the complete binary tree of exactly one node is simply
the tree consisting of that node by itself. The property of being “complete” is preserved
if, at each step, we expand the tree by connecting exactly zero or two individual nodes (or
complete binary trees) to any node in the tree (but both must be connected to the same
node.)
A digital search tree is a tree which stores strings internally so that there is no need for
extra leaf nodes to store the strings.
357
50.11 digital tree
A digital tree is a tree for storing a set of strings where nodes are organized by substrings
common to two or more strings. Examples of digital trees are digital search trees and tries.
Note that this is similar to (indeed, a subtree of) the construction given for a tree with no
cofinal branches. It consists of ι disjoint branches, with the β-th branch of height kβ . Since
ι < κ, every level has fewer than κ elements, and since the sequence is cofinal in κ, T must
have height and cardinality κ.
We can show by induction that |Tα | < ω1 for each α < ω1 . For the base case, T0 has only one
element. If |Tα | < ω1 then |Tα+1 | = |Tα | · |Q| = |Tα | · ω = ω < ω1 . If α < ω1 is a limit ordinal
then T (α) is a countable union of countable sets, and therefore itself countable. Therefore
there are a countable number of branches, so Tα is also countable. So T has countable levels.
Suppose T has an uncountable branch, B = hb0 , b1 , . . .i. Then for any i < j < ω1 , bi ⊂ bj .
Then for each i, there is some xi ∈ bi+1 \ bi such that xi is greater than any element of
bi . Then hx0 , x1 , . . .i is an uncountable increasing sequence of rational numbers. Since the
rational numbers are countable, there is no such sequence, so T has no uncountable branch,
and is therefore Aronszajn.
358
50.13 example of tree (set theoretic)
The set Z+ is a tree with <T =<. This isn’t a very interesting tree, since it simply consists
of a line of nodes. However note that the height is ω even though no particular node has
that height.
A more
S interesting tree using Z+ defines m <T n if ia = m and ib = n for some i, a, b ∈
Z+ {0}. Then 1 is the root, and all numbers which are not powers of another number are
in T1 . Then all squares (which are not also fourth powers) for T2 , and so on.
To illustrate the concept of a cofinal branch, observe that for any limit ordinal κ we can
construct a κ-tree which has no cofinal branches. We let T = {(α, β)|α < β < κ} and
(α1 , β1 ) <T (α2 , β2 ) ↔ α1 < α2 ∧ β1 = β2 . The tree then has κ disjoint branches, each
consisting of the set {(α, β)|α < β} for some β < κ. No branch is cofinal, since each branch
is capped at β elements, but for any γ < κ, there is a branch of height γ + 1. Hence the
suprememum of the heights is κ.
An extended binary tree is a transformation of any binary tree into a complete binary tree.
This transformation consists of replacing every null subtree of the original tree with “special
nodes.” The nodes from the original tree are then internal nodes, while the “special nodes”
are external nodes.
The following tree is its extended binary tree. Empty circles represent internal nodes, and
filled circles represent external nodes.
Every internal node in the extended tree has exactly two children, and every external node
is a leaf. The result is a complete binary tree.
359
50.15 external path length
Given a binary tree T , construct its extended binary tree T 0 . The external path length
of T is then defined to be the sum of the lengths of the paths to each of the external nodes.
E = 2 + 3 + 3 + 3 + 3 + 3 + 3 = 20
The internal path length of T is defined to be the sum of the lengths of the paths to each
of the internal nodes. The internal path length of our example tree (denoted I) is
I =1+2+0+2+1+2=8
Note that in this case E = I + 2n, where n is the number of internal nodes. This happens
to hold for all binary trees.
An internal node of a tree is any node which has degree greater than one. Or, phrased in
rooted tree terminology, the internal nodes of a tree are the nodes which have at least one
child node.
360
•
• •
• •
A leaf of a tree is any node which has degree of exactly 1. Put another way, a leaf node of
a rooted tree is any node which has no child nodes.
• •
• •
A parent node P of a node C in a tree is the first node which lies along the path from C
to the root of the tree, R.
Drawn in the canonical root-at-top manner, the parent node of a node C in a tree is simply
the node immediately above C which is connected to it.
361
•
• •
• •
Let T be a tree with finite levels and an infinite number of elements. Then consider the
elements of T0 . T can be partitioned into the set of descendants of each of these elements,
and since any finite partition of an infinite set has at least one infinite partition, some element
x0 in T0 has an infinite number of descendants. The same procedure can be applied to the
children of x0 to give an element x1 ∈ T1 which has an infinite number of descendants, and
then to the children of x1 , and so on. This gives a sequence X = hx0 , x1 , . . .i. The sequence
is infinite since each element has an infinite number of descendants, and since xi+1 is always
of child of xi , X is a branch, and therefore an infinite branch of T .
The root of a tree is a place-holder node. It is typically drawn at the top of the page, with
the other nodes below (with all nodes having the same path distance from the root at the
same height.)
362
•
• •
• •
Any tree can be redrawn this way, selecting any node as the root. This is important to
note: taken as a graph in general, the notion of “root” is meaningless. We introduce a root
explicitly when we begin speaking of a graph as a tree– there is nothing in general that
selects a root for us.
However, there are some special cases of trees where the root can be distinguished from the
other nodes implicitly due to the properties of the tree. For instance, a root is uniquely
identifiable in a complete binary tree, where it is the only node with degree two.
50.21 tree
Formally, a forest is an undirected, acyclic graph. A forest consists of trees, which are
themselves acyclic, connected graphs. For example, the following diagram represents a forest,
each connected component of which is a tree.
• • • • • •
• • • • •
All trees are forests, but not all forests are trees. As in a graph, a forest is made up of vertices
(which are often called nodes interchangeably) and edges. Like any graph, the vertices and
edges may each be labelled — that is, associated with some atom of data. Therefore a forest
or a tree is often used as a data structure.
Often a particular node of a tree is specified as the root. Such trees are typically drawn with
the root at the top of the diagram, with all other nodes depending down from it (however
this is not always the case). A tree where a root has been specified is called a rooted tree. A
363
tree where no root has been specified is called a free tree. When speaking of tree traversals,
and most especially of trees as datastructures, rooted trees are often implied.
The edges of a rooted tree are often treated as directed. In a rooted tree, every non-root
node has exactly one edge that leads to the root. This edge can be thought of as connecting
each node to its parent. Often rooted trees ae considered directed in the sense that all edges
connect parents to their children, but not vice-versa. Given this parent-child relationship, a
descendant of a node in a directed tree is defined as any other node reachable from that
node (that is, a node’s children and all their descendants).
Given this directed notion of a rooted tree, a rooted subtree can be defined as any node
of a tree and all of its descendants. This notion of a rooted subtree is very useful in dealing
with trees inductively and defining certain algorithms inductively.
Because of their simple structure and unique properties, trees and forests have many uses.
Because of the simple definition of various tree traversals, they are often used to store and
lookup data. Many algorithms are based upon trees, or depend upon a tree in some manner,
such as the heapsort algorithm or Huffman encoding. There are also a great many specific
forms and families of trees, each with its own constraints, strengths, and weaknesses.
Let X be the set of leaf nodes in a weight-balanced binary tree. Let the distance between
leaf nodes be identified with the weighted path length between them. We will show that this
distance metric on X is ultrametric.
Before we begin, let the join of any two nodes x, y, denoted x ∨ y, be defined as the node
z which is the most immediate common ancestor of x and y (that is, the common ancestor
which is farthest from the root). Also, we are using weight-balanced in the sense that
• the weighted path length from the root to each leaf node is equal, and
• each subtree is weight-balanced, too.
Because the tree is weight-balanced, the distances between any node and each of the leaf
node descendents of that node are equal. So, for any leaf nodes x, y,
364
Hence,
d(x, y) = d(x, x ∨ y) + d(y, x ∨ y) = 2 ∗ d(x, x ∨ y) (50.22.2)
We will now show that the ultrametric three point condition holds for any three leaf nodes
in a weight-balanced binary tree.
Consider any three points a, b, c in a weight-balanced binary tree. If d(a, b) = d(b, c) = d(a, c),
then the three point condition holds. Now assume this is not the case. Without loss of
generality, assume that d(a, b) < d(a, c).
Note that both a ∨ b and a ∨ c are ancestors of a. Hence, a ∨ c is a more distant ancestor of
a and so must a ∨ c must be an ancestor of a ∨ b.
Now, consider the path between b and c. to get from b to c is to go from b up to a ∨ b, then
up to a ∨ c, and then down to c. Since this is a tree, this is the only path. The highest node
in this path (the ancestor of both b and c) was a ∨ c, so the distance d(b, c) = 2 ∗ d(b, a ∨ c).
But by Eqn. 50.22.1 and Eqn. 50.22.2 (noting that b is a descendent of a ∨ c), we have
To summarize, we have d(a, b) 6 d(b, c) = d(a, c), which is the desired ultrametric three
point condition. So we are done.
Note that this means that, if a, b are leaf nodes, and you are at a node outside the subtree
under a ∨ b, then d(you, a) = d(you, b). In other words, (from the point of view of distance
between you and them,) the structure of any subtree that is not your own doesn’t matter to
you. This is expressed in the three point condition as ”if two points are closer to each other
than they are to you, then their distance to you is equal”.
(above, we have only proved this if you are at a leaf node, but it works for any node which is
outside the subtree under a ∨ b, because the paths to a and b must both pass through a ∨ b).
365
50.23 weighted path length
Given an extended binary tree T (that is, simply any complete binary tree, where leafs are
denoted as external nodes), associate weights with each external node. The weighted
path length of T is the sum of the product of the weight and path length of each external
node, over all external nodes.
P
Another formulation is that weighted path length is wj lj over all external nodes j, where
wj is the weight of an external node j, and lj is the distance from the root of the tree to j.
If wj = 1 for all j, then weighted path length is exactly the same as external path length.
Example
Let T be the following extended binary tree. Square nodes are external nodes, and circular
nodes are internal nodes. Values in external nodes indicate weights, which are given in this
problem, while values in internal nodes represent the weighted path length of subtrees rooted
at those nodes, and are calculated from the given weights and the given tree. The weight of
the tree as a whole is given at the root of the tree.
This tree happens to give the minimum weighted path length for this particular set of
weights.
366
Chapter 51
The Heawood number of a surface is the maximal number of colors needed to color any graph
embedded in the surface. For example, four-color conjecture states that Heawood number
of the sphere is four.
In 1890 Heawood proved for all surfaces except sphere that the Heawood number is
$ p %
7 + 49 − 24e(S)
H(S) 6 ,
2
Later it was proved in the works of Franklin, Ringel and Youngs that
$ p %
7 + 49 − 24e(S)
H(S) > .
2
For example, the complete graph on 7 vertices can be embedded in torus as follows:
1 2 3 1
4 4
6 7
5 5
1 2 3 1
367
REFERENCES
1. Béla Bollobás. Graph Theory: An Introductory Course, volume 63 of GTM. Springer-Verlag,
1979. Zbl 0411.05032.
2. Thomas L. Saaty and Paul C. Kainen. The Four-Color Problem: Assaults and Conquest. Dover,
1986. Zbl 0463.05041.
REFERENCES
1. Kazimierz Kuratowski. Sur le problème des courbes gauches en topologie. Fund. Math., 15:271–
283, 1930.
The number of incidences of a set of n points and a set of m lines in the real plane R2 is
2
I = O(n + m + (nm) 3 ).
Proof. Let’s consider the points as vertices of a graph, and connect two vertices by an edge
if they are adjacent on some line. Then the number of edges is e = I − m. If e < 4n then
we are done. If e > 4n then by crossing lemma
1 (I − m)3
m2 > cr(G) > ,
64 n2
and the theorem follows.
Recently, Tóth[1] extended the theorem to the complex plane C2 . The proof is difficult.
368
REFERENCES
1. Csaba D. Tóth. The Szemerédi-Trotter theorem in the complex plane. arXiv:CO/0305283, May
2003.
The crossing number cr(G) of a graph G is the minimal number of crossings among all
embeddings of G in the plane.
A graph (V, E) is identified by its vertices V = {v1 , v2 , . . .} and its edges E = {{vi , vj }, {vk , vl }, . . .}.
A graph also admits a natural topology, called the graph topology, by identifying every
edge {vi , vj } with the unit interval I = [0, 1] and gluing them together at coincident vetices.
369
Version: 3 Owner: igor Author(s): igor
A planar graph is a graph which can be drawn on a plane (flat 2-d surface) with no edge
crossings.
No complete graphs above K4 are planar. K4 , drawn without crossings, looks like :
C D
Euler’s formula implies the linear lower bound cr(G) > m − 3n + 6, and so it cannot be used
directly. What we need is to consider the subgraphs of our graph, apply Euler’s formula on
them, and then combine the estimates. The probabilistic method provides a natural way to
do that.
1 m3
cr(G) > .
64 n2
370
Similarly, if m > 92 n, then we can set p = 9n
2m
to get
4 m3
cr(G) > .
243 n2
REFERENCES
1. Martin Aigner and Günter M. Ziegler. Proofs from THE BOOK. Springer, 1999.
371
Chapter 52
In comparing two bit patterns, the Hamming distance is the count of bits different in the two
patterns. More generally, if two ordered lists of items are compared, the Hamming distance
is the number of items that do not identically agree. This distance is applicable to encoded
information, and is a particularly simple metric of comparison, often more useful than the
city-block distance or Euclidean distance.
References
372
Chapter 53
A B
E F
H G
D C
One way to think of a bipartite graph is by partitioning the vertices into two disjoint sets
where vertices in one set are adjacent only to vertices in the other set. In the above graph,
this may be more obvious with a different representation:
373
A E
F B
H D
C G
The two subsets are the two columns of vertices, all of which have the same colour.
A graph is bipartite if and only if all its cycles have even length. This is easy to see intuitively:
any path of odd length on a bipartite must end on a vertex of the opposite colour from the
beginning vertex and hence cannot be a cycle.
The chromatic number of a graph is the minimum number of colours required to colour
it.
A B C F
D E
This graph has been coloured using 3 colours. Furthermore, it’s clear that it cannot be
coloured with fewer than 3 colours, as well: it contains a subgraph (BCD) that is isomorphic
to the complete graph of 3 vertices. As a result, the chromatic number of this graph is indeed
3.
This example was easy to solve by inspection. In general, however, finding the chromatic
number of a large graph (and, similarly, an optimal colouring) is a very difficult (NP-hard)
problem.
374
53.3 chromatic number and girth
Obviously, we can easily have graphs with high chromatic numbers. For instance, the
complete graph Kn trivially has χ(Kn ) = n; however girth(Kn ) = 3 (for n ≥ 3). And
the cycle graph Cn has girth(Cn ) = n, but
1 n = 1
χ(Cn ) = 2 n even
3 otherwise.
It seems intuitively plausible that a high chromatic number occurs because of short, “local”
cycles in the graph; it is hard to envisage how a graph with no short cycles can still have
high chromatic number.
Instead of envisaging, Erdös’ proof shows that, in some appropriately chosen probability space
on graphs with n vertices, the probability of choosing a graph which does not have χ(G) ≥ k
and girth(G) ≥ g tends to zero as n grows. In particular, the desired graphs exist.
This seminal paper is probably the most famous application of the probabilistic method, and
is regarded by some as the foundation of the method.2 Today the probabilistic method is
a standard tool for combinatorics. More constructive methods are often preferred, but are
almost always much harder.
Let G be a graph (in the sense of graph theory) whose set V of vertices is finite and nonempty,
and which has no loops or multiple edges. For any natural number x, let χ(G, x), or just χ(x),
denote the number of x-colorations of G, i.e. the number of mappings f : V → {1, 2, . . . , x}
such that f (a) 6= f (b) for any pair (a, b) of adjacent vertices. Let us prove that χ (which
is called the chromatic polynomial of the graph G) is a polynomial function in x with
coefficients in Z. Write E for the set of edges in G. If |E|=0, then trivially χ(x) = x|V |
(where | · | denotes the number of elements of a finite set). If not, then we choose an edge e
1
See the very readable P. Erdös, Graph theory and probability, Canad J. Math. 11 (1959), 34–38.
2
However, as always, with the benefit of hindsight we can see that the probabilistic method had been used
before, e.g. in various applications of Sard’s theorem. This does nothing to diminish from the importance
of the clear statement of the tool.
375
and construct two graphs having fewer edges than G: H is obtained from G by contracting
the edge e, and K is obtained from G by omitting the edge e. We have
for all x ∈ N, because the polynomial χ(K, x) is the number of colorations of the vertices of
G which might or might not be valid for the edge e, while χ(H, x) is the number which are
not valid. By induction on |E|, (53.4.1) shows that χ(G, x) is a polynomial over Z.
for some nonzero integer s, where k is the number of connected components of G, and the
coefficients alternate in sign.
With the help of the Möbius-Rota inversion formula (see Moebius inversion), or directly by
induction, one can prove X
χ(x) = (−1)|F | x|V |−r(F )
F ⊂E
where the sum is over all subsets F of E, and r(F ) denotes the rank of F in G, i.e. the
number of elements of any maximal cycle-free subset of F . (Alternatively, the sum may be
taken only over subsets F such that F is equal to the span of F ; all other summands cancel
out in pairs.)
The chromatic number of G is the smallest x > 0 such that χ(G, x) > 0 or, equivalently,
such that χ(G, x) 6= 0.
The Tutte polynomial of a graph, or more generally of a matroid (E, r), is this function
of two variables: X
t(x, y) = (x − 1)r(E)−r(F ) (y − 1)|F |−r(F ) .
F ⊂E
Compared to the chromatic polynomial, the Tutte contains more information about the
matroid. Still, two or more nonisomorphic matroids may have the same Tutte polynomial.
The colouring problem is to assign a colour to every vertex of a graph such that no two
adjacent vertices have the same colour. These colours, of course, are not necessarily colours
in the optic sense.
376
A B C F
D E
A B C F
D E
A and C have the same colour; B and E have a second colour; and D and F have another.
Graph colouring problems have many applications in such situations as scheduling and
matching problems.
The complete bipartite graph Kn,m is a graph with two sets of vertices, one with n
members and one with m, such that each vertex in one set is adjacent to every vertex in the
other set and to no vertex in its own set. As the name implies, Kn,m is bipartite.
K2,5 :
A D
C F
K3,3 :
377
A D
B E
C F
The complete k-partite graph Ka1 ,a2 ...ak is a k-partite graph with a1 , a2 . . . ak vertices of each
colour wherein every vertex is adjacent to every other vertex with a different colour and to
no vertices with the same colour.
A B C
E H
F I
The four-color conjecture was a long-standing problem posed by Guthrie while coloring a
map of England. The conjecture states that every map on a plane or a sphere can be colored
using only four colors such that no two adjacent countries are assigned the same color. This
is equivalent to the statement that chromatic number of every planar graph is no more than
four. After many unsuccessfull attempts the conjecture was proven by Appel and Haken in
1976 with an aid of computer.
378
Interestingly, the seemingly harder problem of determining the maximal number of colors
needed for all surfaces other than the sphere was solved long before the four-color conjecture
was settled. This number is now called the Heawood number of the surface.
REFERENCES
1. Thomas L. Saaty and Paul C. Kainen. The Four-Color Problem: Assaults and Conquest. Dover,
1986.
An alternate definition of a k-partite graph is a graph where the vertices are partitioned into
k subsets with the following conditions:
2. There is no partition of 2. the vertices with fewer than k subsets where condition 1 holds.
These two definitions are equivalent. Informally, we see that a colour can be assigned to all
the vertices in each subset, since they are not adjacent to one another. Furthermore, this is
also an optimal colouring, since the second condition holds.
A B C
G H
D E F
379
53.10 property B
A hypergraph G is said to possess property B if it 2-colorable, i.e., its vertices can be colored
in two colors, so that no edge of G is monochromatic.
380
Chapter 54
54.1 cut
On a digraph, define a sink to be a vertex with out-degree zero and a source to be a vertex
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one
sink and exactly one source. A cut C on G is a subset of the edges such that every path
from the source to the sink passes through an edge in C. In other words, if we remove every
edge in C from the graph, there is no longer a path from the source to the sink.
Observe that we may achieve a trivial cut by removing all the edges of G. Typically, we are
more interested in minimal cuts, where the weight of the cut is minimized for a particular
graph.
The vertices of the de Bruijn digraph B(n, m) are all possible words of length m−1 chosen
from an alphabet of size n.
B(n, m) has nm edges consisting of each possible word of length m from an alphabet of size
381
n. The edge a1 a2 . . . an connects the vertex a1 a2 . . . an−1 to the vertex a2 a3 . . . an .
0000
000
1000 0001
1001
100 001
0100 0010
010
1100 1010 0101 0011
101
1101 1011
0110
110 011
1110 0111
111
1111
Notice that an Euler cycle on B(n, m) represents a shortest sequence of characters from an
alphabet of size n that includes every possible subsequence of m characters. For example,
the sequence 000011110010101000 includes all 4-bit subsequences. Any de Bruijn digraph
must have an Euler cycle, since each vertex has in degree and out degree of m.
If E is symmetric (i.e., (u, v) ∈ E if and only if (v, u) ∈ E), then the digraph is isomorphic
to an ordinary (that is, undirected) graph.
Digraphs are generally drawn in a similar manner to graphs with arrows on the edges to
indicate a sense of direction. For example, the digraph
({a, b, c, d}, {(a, b), (b, d), (b, c), (c, b), (c, c), (c, d)})
may be drawn as
382
a b
c d
54.4 flow
On a digraph, define a sink to be a vertex with out-degree zero and a source to be a vertex
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one
sink and exactly one source. A flow on G is an assignment f : E(G) → R of values to each
edge of G satisfying certain rules:
1. For any edge e, we must have 0 6 f (e) 6 W (e) (where W (e) is the weight of e).
2. For any vertex v, excluding the source and the sink, let Ein be the set of edges incident
to v and let Eout be the set of edges incident from v. Then we must have
X X
f (e) = f (e).
e∈Ein e∈Eout
Let Esource be the edges incident from the source, and let Esink be the set of edges incident
to the sink. If f is a flow, then
X X
f (e) = f (e) .
e∈Esink e∈Esource
Note that a flow given by f (e) = 0 trivially satisfies these conditions. We are typically more
interested in maximum flows, where the amount of flow is maximized for a particular
graph.
383
leave. Under this interpretation, the maximum amount of flow corresponds to the maximum
amount of water we could pump through this network.
Instead of water in pipes, one may think of electric charge in a network of conductors. Rule
(2) above is one of Kirchoff’s two laws for such networks; the other says that the sum of the
voltage drops around any circuit is zero.
Let G be a finite digraph with nonnegative weights and with exactly one sink and exactly
one source. Then
I) For any flow f on G and any cut C of G, the amount of flow for f is less than or equal to
the weight of C.
II) There exists a flow f0 on G and a cut C0 of G such that the flow of f0 equals the weight
of C0 .
Proof:(I) is easy, so we prove only (II). Write R for the set of nonnegative real numbers.
Let V be the set of vertices of G. Define a matrix
κ:V ×V →R
where κ(x, y) is the sum of the weights (or capacities) of all the directed edges from x to y.
By hypothesis there is a unique v ∈ V (the source) such that
κ(x, v) = 0 ∀x ∈ V
and a unique w ∈ V (the sink) such that
κ(w, x) = 0 ∀x ∈ V .
We may also assume κ(x, x) = 0 for all x ∈ V . Any flow f will correspond uniquely (see
Remark below) to a matrix
ϕ:V ×V →R
such that
ϕ(x, y) ≤ κ(x, y) ∀x, y ∈ V
X X
ϕ(x, z) = ϕ(z, x) ∀x 6= v, w
z z
Let λ be the matrix of any maximal flow, and let A be the set of x ∈ V such that there
exists a finite sequence x0 = v, x1 , . . . , xn = x such that for all m from 1 to n − 1, we have
either
λ(xm , xm+1 ) < κ(xm , xm+1 ) (54.5.1)
384
or
λ(xm+1 , xm ) > 0 . (54.5.2)
Write B = V − A.
λ(xm+1 , xm ) >
for all the m for which (54.5.2) holds. But now we can define a matrix µ with a larger flow
than λ (larger by ) by:
Now consider the set C of pairs (x, y) of vertices such that x ∈ V and y ∈ W . Since W is
nonempty, C is a cut. But also, for any (x, y) ∈ C we have
for otherwise we would have y ∈ V . Summing (54.5.3) over C, we see that the amount of
the flow f is the capacity of C, QED.
Remark: We expressed the proof rather informally, because the terminology of graph theory
is not very well standardized and cannot all be found yet here at PlanetMath. Please feel
free to suggest any revision you think worthwhile.
54.6 tournament
1 2
3 4
385
Any tournament on a finite number n of vertices contains a Hamiltonian path, i.e., directed
path on all n vertices. This is easily shown by induction on n: suppose that the statement
holds for n, and consider any tournament T on n + 1 vertices. Choose a vertex v0 of T and
consider a directed path v1 , v2 , . . . , vn in T \ {v0 }. Now let i ∈ {0, . . . , n} be maximal such
that vj → v0 for all j with 1 6 j 6 i. Then
v1 , . . . , vi , v0 , vi+1 , . . . , vn
The name “tournament” originates from such a graph’s interpretation as the outcome of
some sports competition in which every player encounters every other player exactly once,
and in which no draws occur; let us say that an arrow points from the winner to the loser.
A player who wins all games would naturally be the tournament’s winner. However, as the
above example shows, there might not be such a player; a tournament for which there isn’t
is called a 1-paradoxical tournament. More generally, a tournament T = (V, E) is called
k-paradoxical if for every k-subset V 0 of V there is a v0 ∈ V \ V 0 such that v0 → v for all
v ∈ V 0 . By means of the probabilistic method Erdös showed that if |V | is sufficiently large,
then almost every tournament on V is k-paradoxical.
386
Chapter 55
Let G = hX|Ri be a presentation of the finitely generated group G with generators X and
relations R. We define the Cayley graph Γ = Γ(G, X) of G with generators X as
Γ = (G, E) ,
where
E = {{u, a · u} |u ∈ G, a ∈ X} .
That is, the vertices of the Cayley graph are precisely the elements of G, and two elements
of G are connected by an edge iff some generator in X transfers the one to the other.
Examples
1. G = Zd , with generators X = {e1 , . . . , ed }, the standard basis vectors. Then Γ(G, X)
is the d-dimensional grid; confusingly, it too is often termed “Zd ”.
2. G = Fd , the free group with the d generators X = {g1 , ..., gd }. Then Γ(G, X) is the
2d-regular tree.
387
Chapter 56
An Euler path along a connected graph with n vertices is a path connecting all n vertices,
and traversing every edge of the graph only once. Note that a vertex with an odd degree
allows one to traverse through it and return by another path at least once, while a vertex
with an even degree only allows a number of traversals through, but one cannot end an Euler
path at a vertex with even degree. Thus, a connected graph has an Euler path which is a
circuit (an Euler circuit) if all of its vertices have even degree. A connected graph has an
Euler path which is non-circuituous if it has exactly two vertices with odd degree.
This graph has an Euler path which is a circuit. All of its vertices are of even degree.
This graph has an Euler path which is not a circuit. It has exactly two vertices of odd degree.
Note that a graph must be connected to have an Euler path or circuit. A graph is connected
if every pair of vertices u and z has a path uv, . . . , yz between them.
The edge set of a graph can be partitioned into cycles if and only if every vertex has even
degree.
388
Version: 2 Owner: digitalis Author(s): digitalis
Any graph that contains no cycles is an acyclic graph. A directed acyclic graph is often
called a DAG for short.
A A
B C B C
In contrast, the following graph and digraph are not acyclic, because each contains a cycle.
A A
B C B C
The bridges of Königsberg is a famous problem inspired by an actual place and situation.
The solution of the problem, put forth by Leonhard Euler in 1736, is the first work of
graph theory and is responsible for the foundation of the discipline.
The following figure shows a portion of the Prussian city of Königsberg. A river passes
through the city, and there are two islands in the river. Seven bridges cross between the
islands and the mainland:
The mathematical problem arose when citizens of Königsburg noticed that one could not
take a stroll across all seven bridges, returning to the starting point, without crossing at
least one bridge twice.
389
Answering the question of why this is the case required a mathematical theory that didn’t
exist yet: graph theory. This was provided by Euler, in a paper which is still available today.
At this point, we can apply what we know about Euler paths and Euler circuits. Since an
Euler circuit for a graph exists only if every vertex has an even degree, the Königsberg graph
must have no Euler circuit. Hence, we have explained why one cannot take a walk around
Königsberg and return to the starting point without crossing at least one bridge more than
once.
56.5 cycle
A cycle in a graph, digraph, or multigraph, is simple path from a vertex to itself (i.e., a
path where the first vertex is the same as the last vertex and no edge is repeated).
D C
ABCDA and BDAB are two of the cycles in this graph. ABA is not a cycle, however, since
it uses the edge connecting A and B twice. ABCD is not a cycle because it begins on A but
ends on D.
An even cycle is one of even length; similarly, an odd cycle is one of odd length.
390
56.6 girth
For instance, the girth of any grid Zd (where d > 2) is 4, and the girth of the vertex graph
of the dodecahedron is 5.
56.7 path
A path in a graph is a finite sequence of alternating vertices and edges, beginning and ending
with a vertex, v1 e1 v2 e2 v3 . . . en−1 vn such that every consecutive pair of vertices vx and vx+1
are adjacent and ex is incident with vx and with vx+1 . Typically, the edges may be omitted
when writing a path (e.g., v1 v2 v3 . . . vn ) since only one edge of a graph may connect two
adjacent vertices. In a multigraph, however, the choice of edge may be significant.
A B
D C
Paths include (but are certainly not limited to) ABCD (length 3), ABCDA (length 4), and
ABABABABADCBA (length 12). ABD is not a path since B is not adjacent to D.
In a digraph, each consecutive pair of vertices must be connected by an edge with the proper
orientation; if e = (u, v) is an edge, but (v, u) is not, then uev is a valid path but veu is not.
G H
J I
GHIJ, GJ, and GHGHGH are all valid paths. GHJ is not a valid path because H and
J are not connected. GJI is not a valid path because the edge connecting I to J has the
1
There is no widespread agreement on the girth of a forest, which has no cycles. It is also extremely
unimportant.
391
opposite orientation.
The proof is very easy by induction on the number of elements of the set E of edges. If E is
empty, then all the vertices have degree zero, which is even. Suppose E is nonempty. If the
graph contains no cycle, then some vertex has degree 1, which is odd. Finally, if the graph
does contain a cycle C, then every vertex has the same degree mod 2 with respect to E − C,
as it has with respect to E, and we can conclude by induction.
392
Chapter 57
05C40 – Connectivity
For k ∈ N, a graph G is k-connected iff G has more than k vertices and if the graph left by
removing any k or less vertices is connected. The largest integer k such that G is k-connected
is called the connectivity of G and is denoted by κ(G).
Every 3-connected graph G with more than 4 vertices has an edge e such that G/e is also
3-connected.
Suppose such an edge doesn’t exist. Then, for every edge e = xy, the graph G/e isn’t
3-connected and can be made disconnected by removing 2 vertices. Since κ(G) > 3, our
contracted vertex vxy has to be one of these two. So for every edge e, G has a vertex z 6= x, y
such that {vxy , z} separates G/e. Any 2 vertices separated by {vxy , z} in G/e are separated
in G by S := {x, y, z}. Since the minimal size of a separating set is 3, every vertex in S has
an adjacent vertex in every component of G − S.
Now we choose the edge e, the vertex z and the component C such that |C| is minimal. We
also choose a vertex v adjacent to z in C.
393
to the same component and G − {z, v, w} isn’t connected. Any vertex adjacent to v in D
is also an element of C since v is an element of C. This means D is a proper subset of C
which contradicts our assumption that |C| was minimal.
Every 3-connected simple graph can be constructed starting from a wheel graph by repeat-
edly either adding an edge between two non-adjacent vertices or splitting a vertex.
A connected graph is a graph such that there exists a path between all pairs of vertices.
If the graph is a directed graph, and there exists a path from each vertex to every other
vertex, then it is a strongly connected graph.
A connected component is a subset of vertices of any graph and any edges between them
that forms a connected graph. Similarly, a strongly connected component is a subset
of vertices of any digraph and any edges between them that forms a strongly connected
graph. Any graph or digraph is a union of connected or strongly connected components,
plus some edges to join the components together. Thus any graph can be decomposed
into its connected or strongly connected components. For instance, Tarjan’s algorithm can
decompose any digraph into its strongly connected components.
For example, the following graph and digraph are connected and strongly connected, respec-
tively.
A B C A B C
D E F D E F
On the other hand, the following graph is not connected, and consists of the union of two
connected components.
394
A B C
D E F
The following digraph is not strongly connected, because there is no way to reach F from
other vertices, and there is no vertex reachable from C.
A B C
D E F
A B C F
D E
57.5 cutvertex
395
Chapter 58
Proof: Let G = (V, E) be a graph with kGk = n > 3 and δ(G) > n2 . Then G is connected:
otherwise, the degree of any vertex in the smallest component C of G would be less then
kCk > n2 . Let P = x0 ...xk be a longest path in G. By the maximality of P, all the neighbours
of x0 and all the neighbours of xk lie on P. Hence at last n2 of the vertices x0 , ..., xk−1 are
adjacent to xk , and at last n2 of these same k < n vertices xi are such that x0 xi+1 ∈ E. By the
pigeon hole principle, there is a vertex xi that has both properties, so we have x0 xi+1 ∈ E
and xi xk ∈ E for some i < k. We claim that the cycle C := x0 xi+1 P xk xi P x0 is a Hamiltonian
cycle of G. Indeed since G is connected, C would otherwise have a neighbour in G −C, which
could be combined with a spanning path of C into a path longer than P. 2
396
58.3 Euler circuit
An Euler circuit is a connected graph such that starting at a vertex a, one can traverse along
every edge of the graph once to each of the other vertices and return to vertex a. In other
words, an Euler circuit is an Euler path that is a circuit. Thus, using the properties of odd
and even degree vertices given in the definition of an Euler path, an Euler circuit exists iff
every vertex of the graph has an even degree.
2. From that vertex pick an edge to traverse, considering following rule: never cross a
bridge of the reduced graph unless there is no other choice
5. Repeat 2-4 until all edges have been traversed, and you are back at the starting vertex
By ”reduced graph” we mean the original graph minus the darkened (already used) edges.
397
58.5 Hamiltonian cycle
let G be a graph. If there’s a cycle visiting all vertices exactly once, we say that the cycle is
a hamiltonian cycle.
There is not useful necessary and sufficient condition for a graph being hamiltonian. However,
we can get some necessary conditions from the definition like a hamiltonian graph is always
connected and has order at least 3. This an other observations lead to the condition:
Let G = (V, E) be a graph of order at least 3. If G is hamiltonian, for every proper subset
U of V , the subgraph induced by V − U has at most |U| components.
For the sufficiency conditions, we get results like Ore’s theorem or Bondy and Chvátal theorem
Let G be a graph. A path on G that includes every vertex exactly once is called a hamil-
tonian path.
Let G be a graph of order n ≥ 3 such that, for every pair of distinct non adjacent vertices u
and v, deg(u) + deg(v) ≥ n. Then G is a Hamiltonian graph.
398
58.9 Petersen graph
Petersen’s graph. An example of graph that is traceable but not Hamiltonian. That is, it
has a Hamiltonian path but doesn’t have a Hamiltonian cycle.
58.10 hypohamiltonian
58.11 traceable
399
Chapter 59
f : V (G) → V (H)
with the property that any two vertices u and v from G are adjacent if and only if f (u) and
f (v) are adjacent in H.
If an isomorphism can be constructed between two graphs, then we say those graphs are
isomorphic.
a g
b h
c i
d j
400
1 2
5 6
8 7
4 3
Although these graphs look very different at first, they are in fact isomorphic; one isomor-
phism between them is
f (a) = 1
f (b) = 6
f (c) = 8
f (d) = 3
f (g) = 5
f (h) = 2
f (i) = 4
f (j) = 7
401
Chapter 60
05C65 – Hypergraphs
A Steiner system S(t, k, n) is a k−uniform hypergraph on n vertices such that every set
of t vertices is contained in exactly one edge. Notice that S(2, k, n) are merely 2−uniform
linear spaces. The families of hypergraphs S(2, 3, n) are known as Steiner triple systems.
Let H = (V, E) be a linear space. A finite plane is an intersecting linear space. That is to
say, a linear space in which any two edges in E have a nonempty intersection.
Finite planes are rather restrictive hypergraphs, and the following holds.
Theorem 4. Let H = (V, E) be a finite plane. Then for some positive integer k, H is
(k + 1)−regular, (k + 1)−uniform, and |E| = |V | = k 2 + k + 1.
The above k is the order of the finite plane. It is not known in general if finite planes exist
of order other than k a power of a prime. The terminology ”finite plane” is suggestive, as we
can think of the edges as a finite collection of lines in Euclidean space in General position
so that they all intersect pairwise in exactly one point. The added restriction that all pairs
of vertices determine an edge (i.e. ”any two points determine a line”), however, makes it
impossible to depict a finite plane in the Euclidean plane by means of straight lines, except
for the trivial case k = 1. The finite plane of order 2 is known as the Fano plane. The
following is a diagrammatic representation:
402
An edge here is represented by a straight line, and the inscribed circle is also an edge. In
other words, for a vertex set {1, 2, 3, 4, 5, 6, 7}, the edges of the Fano plane are
{1, 2, 4}
{2, 3, 5}
{3, 4, 6}
{4, 5, 7}
{5, 6, 1}
{6, 7, 2}
{7, 1, 3}
Notice that the Fano plane is generated by the ordered triplet (1, 2, 4) and adding 1 to each
entry, modulo 7. The generating triplet has the property that the differences of any two
elements, in either order, are all pairwise different modulo 7. In general, if we can find a set
of k + 1 elements of Zk2 +k+1 (the integers modulo k 2 + k + 1) with all pairwise differences
distinct, then this gives a cyclic representation of the finite plane of order k.
60.3 hypergraph
Sometimes it is desirable to restrict this definition more. The empty hypergraph is not very
interesting, so we usually accept that V 6= ∅. Singleton edges are allowed in general, but
not the empty one. Most applications consider only finite hypergraphs, but occasionally it
is also useful to allow V to be infinite.
The transpose At of the incidence matrix also defines a hypergraph H∗ , the dual of H, in
an obvious manner. To be explicit, let H∗ = (V ∗ , E∗) where V ∗ is an m−element set and E∗
is an n−element set of subsets of V ∗ . For vj∗ ∈ V ∗ and e∗i ∈ E∗ , vj∗ ∈ e∗i if and only if aij = 1.
403
Notice, of course, that the dual of a uniform hypergraph is regular and vice-versa. It is not
rare to see fruitful results emerge by considering the dual of a hypergraph.
A hypergraph H = (V, E) is a linear space if any pair of vertices is found in exactly one
edge. This usage of the term has no relation to its occasional appearance in linear algebra
as a synonym for a vector space.
The following two observations are often useful for linear spaces. Let n = |V | and pick some
arbitrary v ∈ V .
1.
X |e|
n
=
2 2
e∈E
2. X
|e| − 1 = n − 1
e3v
404
Chapter 61
Every graph of order n and size greater than bn2 /4c contains a triangle (cycle of order 3).
61.2 clique
A maximal complete subgraph of a graph is a clique, and the clique number ω(G) of a
graph G is the maximal order of a clique in G. Simply, ω(G) is the maximal order of a
complete subgraph of G. Some authors however define a clique as any complete subgraph of
G and refer to the other definition as maximum clique.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
Let’s consider a graph G with n vertices and no triangles and find a charaterization for this
kind of graphs which make them distinct from those graphs which have n vertices and at
least one triangle.Let a vertex vi i = 1, 2, .., n has his own weight wi such that Σwi = 1 . Let
405
S=Σwi wj for any (i,j) in E(G). Now,take two vertices h ,k not joined let x the total weight
of the neighbors of k so y is for k and let’s assume x >= y.If we take a little portion of
weight from the vertex k and we put this in the weight of h then ,this shifting won’t decrease
S, so S would be maximal when all the weight is over a K2 ,that is a complete subgraph
made up of two vertices.Therefore,since S = vw and v + w = 1 at last w = v = 1/2 and
S <= 1/4.The theorem is proven considering all the weights equal to 1/n.In fact in this case
S = 1/n2 |E(G)| and |E(G)| <= n2 /4.
406
Chapter 62
Let G(V, E) be any finite graph. G has a complete matching iff ∀X ⊆ V (G) : cp (G − X) 6
|X|, where cp (H) is the number of the components of a H graph with an odd number of
points.
Matrix form Suppose we have a bipartite graph G and we partition the vertices into two
sets, V1 and V2 , of the same colour. We may then represent the graph with a simplified
407
adjacency matrix with |V1 | rows and |V2 | columns containing a 1 where an edge joins the
corresponding vertices and a 0 where there is no edge.
We say that two 1s in the matrix are line-independent if they are not in the same row or
column. Then a matching on the graph with be a subset of the 1s in the matrix that are all
line-independent.
For example, consider this bipartite graph (the thickened edges are a matching):
A complete matching on a bipartite graph G(V1 , V2 , E) is one that saturates all of the
vertices in V1 .
408
Note that this is the same matching on the graph shown above.
Finding Bipartite Matchings One method for finding maximal bipartite matchings in-
volves using a network flow algorithm. Before using it, however, we must modify the graph.
Start with a bipartite graph G. As usual, we consider the two sets of vertices V1 and V2 .
Replace every edge in the graph with a directed arc from V1 to V2 of capacity L, where L is
some large integer.
Invent two new vertices: the source and the sink. Add a directed arc of capacity 1 from the
source to each vertex in V1 . Likewise, add a directed arc of capacity 1 from each vertex in
V2 to the sink.
Now find the maximum flow from the source to the sink. The total weight of this flow will
be the of the maximum matching on G. Similarly, the set of edges with non-zero flow will
constitute a matching.
There also exist algorithms specifically for finding bipartite matchings that avoid the over-
head of setting up a weighted digraph suitable for network flow.
Let G be a graph. An edge covering C on G is a subset of the vertices of G such that each
edge in G is incident with at least one vertex in C.
For any graph, the vertex set is a trivial edge covering. Generally, we are more interested
in minimal coverings. A minimal edge covering is simply an edge covering of the least
possible.
62.5 matching
Let G be a graph. A matching M on G is a subset of the edges of G such that each vertex
in G is incident with no more than one edge in M.
It is easy to find a matching on a graph; for example, the empty set will always be a matching.
Typically, the most interesting matchings are maximal matchings. A maximal matching
on a graph G is simply a matching of the largest possible size.
409
Version: 3 Owner: vampyr Author(s): vampyr
The maximal bipartite matching algorithm is similar some ways to the Ford-Fulkerson algo-
rithm for network flow. This is not a coincidence; network flows and matchings are closely
related. This algorithm, however, avoids some of the overhead associated with finding net-
work flow.
The algorithm employs a clever labeling trick to find these paths and to ensure that the set
of edges chosen remains a valid matching.
The algorithm as described here uses the matrix form of a bipartite graph. Translating the
matching from a matrix to a graph is straightforward.
Labeling We begin with a matrix with R rows and C columns containing 0s, 1s, and 1∗s,
where a 1∗ indicates in edge in the matching and a 1 indicates an edge not in the matching.
Number the columns 1 . . . C and number the rows 1 . . . R.
Start by labeling each column that contains no 1∗s with the symbol #.
Now we scan the columns. Scan each column i that has been labelled but not scanned. Find
each 1 in column i that is in an unlabelled row; label this row i. Mark column i as scanned.
Next, we scan the rows. Scan each row j that has been labelled but not scanned. Find the
first 1∗ in row j. Label the column in which it appears j, and mark row j as scanned. If
there is no 1∗ in row j, proceed to the flipping phase.
Otherwise, go back to row scanning. Continue scanning and labelling until there are no
labelled, unscanned rows or columns; at that point, the set of 1∗s is a maximal matching.
410
Flipping We enter the flip phase when we scan some row j that contains no 1∗. This row
must have some label c, and in column c, row j of the matrix, there must be a 1; change
this to a 1∗.
Now consider column c; it has some label r. If r is #, then stop; go back to the labelling
phase. Otherwise, change the 1∗ at column c, row r to a 1.
Notes The algorithm must begin with some matching; we may begin with the empty set
(or a single edge), since that is always a matching. However, each iteration through the
process increases the size of the matching by exactly one. Therefore, we can make a simple
optimization by starting with a larger matching. A naı̈ve greedy algorithm can quickly
choose a valid matching that is usually close to the size of the maximal matching; we may
initalize our matrix with that matching to give the procedure a head start.
Consider some edge e ∈ M. At least one of the vertices that e joins must be in C, because C
is an edge covering and hence every edge is incident with some vertex in C. Call this vertex
ve , and let f (e) = ve .
Now we will show that f one-to-one. Suppose we have two edges e1 , e2 ∈ M where f (e1 ) =
f (e2 ) = v. By the definition of f , e1 and e2 must both be incident with v. Since M is a
matching, however, no more than one edge in M can be incident with any given vertex in
G. Therefore e1 = e2 , so f is one-to-one.
411
Proof Suppose M is not a maximal matching. Then, by definition, there exists another
matching M 0 where |M| < |M 0 |. But then |M 0 | > |C|, which violates the above theorem.
Likewise, suppose C is not a minimal edge covering. Then, by definition, there exists another
covering C 0 where |C 0 | < |C|. But then |C 0 | < |M|, which violates the above theorem.
412
Chapter 63
63.1 multigraph
A multigraph is a graph in which we allow more than one edge to join a pair of vertices.
Two or more edges that join a pair of vertices are called parallel edges. Every graph, then,
is a multigraph, but not all multigraphs are graphs.
63.2 pseudograph
413
Chapter 64
The first example is the existence of k-paradoxical tournaments. The proof hinges upon the
following basic probabilistic inequality, for any events A and B,
[
P A B 6 P (A) + P (B)
Theorem 5. For every k, there exists a tournament T (usually very large) such that T is
k-paradoxical.
L et n = |T | be the number of vertices of T , where n > k. We will show that for n large
enough, a k-paradoxical tournament must exist. The probability space in question is all
possible directions of the arrows of T , where each arrow can point in either direction with
probability 1/2, independently of any other arrow.
We say that a set K of k vertices is arrowed by a vertex v0 outside the set if every arrow
between v0 to wi ∈ K points from v0 to wi , for i = 1, . . . , k. Consider a fixed set K of k
vertices and a fixed vertex v0 outside K. Thus, there are k arrows from v0 to K, and only
one arrangement of these arrows permits K to be arrowed by v0 , thus
1
P (K is arrowed by v0 ) = .
2k
The complementary event, is therefore,
1
P (K is not arrowed by v0 ) = 1 − .
2k
By independence, and because there are n − k vertices outside of K,
n−k
1
P (K is not arrowed by any vertex) = 1 − k . (64.1.1)
2
414
Lastly, since there are nk sets of cardinality k in T , we employ the inequality mentioned
above to obtain that for the union of all events of the form in equation (64.1.1)
n−k
n 1
P (Some set of k vertices is not arrowed by any vertex) 6 1− k .
k 2
If the probability of this last event is less than 1 for some n, then there must exist a k-
paradoxical tournament of n vertices. Indeed there is such an n, since
n−k n−k
n 1 1 1
1− k = n(n − 1) · · · (n − k + 1) 1 − k
k 2 k! 2
n−k
1 k 1
< n 1− k
k! 2
Therefore, regarding k as fixed while n tends to infinity, the right-hand-side above tends to
zero. In particular, for some n it is less than 1, and the result follows.
The probabilistic method was pioneered by Erdös Pál (known to Westerners as Paul
Erdös) and initially used for solving problems in graph theory, but has acquired ever wider
applications. Broadly, the probabilistic method is somewhat the opposite of extremal graph
theory. Instead of considering how a graph can behave in the most extreme case, we
consider how a collection of graphs behave ”on average”, whereby we can formulate a
probability space. The fruits reaped by this method are often raw existence theorems, usu-
ally deduced from the fact that the nonexistence of whatever sort of graph would mean a zero
probability. For instance, by means of the probabilistic method, Erdös proved the existence
of a graph of arbitrarily high girth and chromatic number, a very counterintuitive result.
Graphs tend to get enormous as the chromatic number and girth increase, thereby severely
hindering necessary computations to explicitly construct them, so an existence theorem is
most welcome.
In all honesty, probabilistic proofs are nothing more than counting proofs in disguise, since
determining the probabilities of interest will invariably involve detailed counting arguments.
In fact, we could remove from any probabilistic proof any mention of a probability space,
although the result may be significanltly less transparent. Also, the advantage of using
probability is that we can employ all the machinery of probability theory. Markov chains,
martingales, expectations, probabilistic inequalities, and many other results, all become the
tools of the trade in dealing with seemingly static objects of combinatorics and number
theory.
415
REFERENCES
1. Noga Alon and Joel H. Spencer. The probabilistic method. John Wiley & Sons, Inc., second
edition, 2000. Zbl 0996.05001.
2. Paul Erdös and Joel Spencer. Probabilistic methods in combinatorics. Academic Press, 1973.
Zbl 0308.05001.
416
Chapter 65
05C90 – Applications
If (A, R) is a finite poset, then it can be represented by a Hasse diagram, where a line is
drawn from x ∈ A up to y ∈ A if
• xRy
• There is no z ∈ A such that xRz and zRy. (There are no in-between elements)
Since we are always drawing from lower to higher elements, we do not need to direct any
edges.
Example: If A = P({1, 2, 3}), the power set of {1, 2, 3}, and R is the subset relation ⊆, then
we can draw the following:
{1, 2, 3}
Even though {3}R{1, 2, 3} (since {3} ⊆ {1, 2, 3}), we do not draw a line directly between
them because there are inbetween elements: {2, 3} and {1, 3}. However, there still remains
an indirect path from {3} to {1, 2, 3}.
417
Version: 13 Owner: xriso Author(s): xriso
418
Chapter 66
05C99 – Miscellaneous
A graph having n vertices, which contains no p-clique with p > 2, has at most
2
1 n
1−
p−1 2
419
edges.
A graph is planar if and only if it contains neither K5 nor K3,3 as a minor, where K5 is
the complete graph of order 5 and K3,3 is the complete bipartite graph of order 6. This is
equivalent to Kuratowski’s theorem.
66.5 block
Any two blocks of a graph G have at most one vertex in common. Also, every vertex
belonging to at least two blocks is a cutvertex of G, and, conversely, every cutvertex belongs
to at least two blocks.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
66.6 bridge
The complete graph with n vertices, denoted Kn , contains all possible edges; that is, any
two vertices are adjacent.
420
The complete graph of 4 vertices, or K4 looks like this:
The number of edges in Kn is the n − 1th triangular number. Every vertex in Kn has degree
n − 1; therefore Kn has an Euler circuit if and only if n is odd. A complete graph always
has a Hamiltonian path, and the chromatic number of Kn is always n.
The degree of a vertex x is d(x) = |Γ(x)|, where Γ(x) is the neighborhood of x. If we want
to emphasize that the underlying graph is G, then we write ΓG (x) and dG (x); this notation
can be intuitively extended to many other graph theoretic functions.
The minimal degree of the vertices of a graph G is denoted by δ(G) and the maximal
degree by ∆(G). A vertex of degree 0 is said to be an isolated vertex. If δ(G) = ∆(G) = k,
that is, every vertex has degree k, then G is said to be k-regular or regular of degree k.
A graph is regular if it is k-regular for some k. A 3-regular graph is said to be cubic.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
Given vertices x and y, their distance d(x, y) is the minimal length of an x − y path. If
there is no x − y path then d(x, y) = ∞.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
66.10 edge-contraction
Given an edge xy of a graph G, the graph G/xy is obtained from G by contracting the edge
xy; that is, to get G/xy we identify the vertices x and y and remove all loops and duplicate
421
edges. A graph G0 obtained by a sequence of edge-contractions is said to be a contraction
of G.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
66.11 graph
A graph G is an ordered pair of disjoint sets (V, E) such that E is a subset of the set V (2)
of unordered pairs of V . V and E are always assumed to be finite, unless explicitly stated
otherwise. The set V is the set of vertices (sometimes called nodes) and E is the set of
edges. If G is a graph, then V = V (G) is the vertex set of G, and E = E(G) is the edge
set. Typically, V (G) is defined to be nonempty. If x is a vertex of G, we sometimes write
x ∈ G instead of x ∈ V (G).
An edge {x, y} is said to join the vertices x and y and is denoted by xy. This xy and yx are
equivalent; the vertices x and y are the endvertices of this edge. If xy ∈ E(G), then x and
y are adjacent, or neighboring, vertices of G, and the vertices x and y are incident with
the edge xy. Two edges are adjacent if they have exactly one common endvertex. Also,
x ∼ y means that the vertex x is adjacent to the vertex y.
Notice that the definition allows pairs of the form {x, x}, which would correspond to a node
joining to itself.
Some graphs.
If, on a given graph, there is at most one edge joining each pair of nodes, we say that the
graph is simple.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
If (Gk )k∈N is an infinite sequence of finite graphs, then there exist two numbers m < n such
that Gm is isomorphic to a minor of Gn .
422
This theorem (proven by Robertson and Seymour) is often referred to as the deepest result
in graph theory. It resolves Wagner’s conjecture in the affirmative and leads to an important
generalization of Kuratowski’s theorem.
Specifically, to every set G of finite graphs which is closed under taking minors (meaning if
G ∈ G and H is isomorphic to a minor of G, then H ∈ G) there exist finitely many graphs
G1 , . . . , Gn such that G consists precisely of those finite graphs that do not have a minor
isomorphic to one of the Gi . The graphs G1 , . . . , Gn are often referred to as the forbidden
minors for the class G.
Graph theory is the branch of mathematics that concerns itself with graphs.
The concept of graphs is extraordinarily simple, which explains their wide applicability. It
is usually agreed upon that graph theory proper was born in 1736, when Euler formalized
the now-famous “bridges of Königsberg” problem. Graph theory has now grown to touch
almost every mathematical discipline, in one form or another, and it likewise borrows from
elsewhere tools for its own problems. Anyone who delves into the topic will quickly see that
the lifeblood of graph theory is the abundancy of tricky questions and clever answers. There
are, of course, general results that systematize the subject, but we also find an emphasis on
the solutions of substantial problems over building machinery for its own sake.
For quite long graph theory was regarded as a branch of topology concerned with 1-dimensional
simplices, however this view has faded away. The only remainder of the topological past is
the topological graph theory, a branch of graph theory that primarily deals with drawing of
graphs on surfaces. The most famous achievement of topological graph theory is the proof
of the four-color conjecture (every political map on the surface of a plane or a sphere can be
colored into four colors given that each country consists of only one piece).
Now, a (finite) graph is usually thought of as a subset of pairs of elements of a finite set
(called vertices), or more generally as a family of arbitrary sets in the case of hypergraphs. For
instance, Ramsey theory as applied to graph theory deals with determining how disordered
can graphs be. The central result here is the Ramsey’s theorem which states that one can
always find many vertices that are either all connected or disconnected from each other,
given that the graph is sufficiently large. The other result is Szemerédi regularity lemma.
The four-color conjecture mentioned above is one of the problems in graph coloring. There
are many ways one can color a graph, but the most common are vertex coloring and edge
coloring. In these type of colorings, one colors vertices (edges) of a graph so that no two
vertices of the same color are adjacent (resp. no edges of the same color share a common
vertex). The most common problem is to find a coloring using the fewest number of colors
423
possible. Such problems often arise in scheduling problems.
Graph theory benefits greatly from interaction with other fields of mathematics. For example,
probabilistic methods have become the standard tool in the arsenal of graph theorists, and
random graph theory has grown into a full-fledged branch of its own.
Version: 22 Owner: karteef Author(s): yark, bbukh, iddo, mathwizard, NeuRet, akrowne
66.14 homeomorphism
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
66.15 loop
In graph theory, a loop is an edge which joins a vertex to itself, rather than to some other
vertex. By definition, a graph cannot contain a loop; a pseudograph, however, may contain
both multiple edges and multiple loops. Note that by some definitions, a multigraph may
contain multiple edges and no loops, while other texts define a multigraph as a graph allowing
multiple edges and multiple loops.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
424
66.17 neighborhood (of a vertex)
Adapted with permission of the author from Modern graph theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
A graph with zero vertices and edges. One possible representation of the null graph is shown
above.
Further Reading
• ’Is the null graph a pointless concept’, by Frank Harary and Ronald Read
The order of a graph G is the number of vertices in G; it is denoted by |G|. The same
notation is used for the number of elements (cardinality) of a set. Thus, |G| = |V (G)|. We
write Gn for an arbitrary graph of order n. Similarly, G(n, m) denotes an arbitrary
graph of order n and size m.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
425
66.20 proof of Euler’s polyhedron theorem
This proof is not one of the standard proofs given to Euler’s formula. I found the idea
presented in one of Coxeter’s books. It presents a different approach to the formula, that may
be more familiar to modern students who have been exposed to a “Discrete Mathematics”
course. It falls into the category of “informal” proofs: proofs which assume without proof
certain properties of planar graphs usually proved with algebraic topology. This one makes
deep (but somewhat hidden) use of the Jordan curve theorem.
Let G = (V, E) be a planar graph; we consider some particular planar embedding of G. Let
F be the set of faces of this embedding. Also let G0 = (F, E 0 ) be the dual graph (E 0 contains
an edge between any 2 adjacent faces of G). The planar embeddings of G and G0 determine
a correspondence between E and E 0 : two vertices of G are adjacent iff they both belong to
a pair of adjacent faces of G; denote by φ : E → E 0 this correspondence.
In all illustrations, we represent a planar graph G, and the two sets of edges T ⊆ E (in red) and
T 0 ⊆ E 0 (in blue).
T 0 contains no loop.
Given any loop of edges in T 0 , we may draw a loop on the faces of G which participate
in the loop. This loop must partition the vertices of G into two non-empty sets, and
only crosses edges of E \ T . Thus, (V, T ) has more than a single connected component,
so T is not spanning. [The proof of this utilizes the Jordan curve theorem.]
T 0 spans G0 .
For suppose T 0 does not connect all faces F . Let f1 , f2 ∈ F be two faces with no path
between them in T 0 . Then T must contain a cycle separating f1 from f2 , and cannot
be a tree. [The proof of this utilizes the Jordan curve theorem.]
S
We thus have a partition E = T φ−1 [T 0 ] of the edges of G into two sets. Recall that in
any tree, the number of edges is one less than the number of nodes. It follows that
as required.
426
66.21 proof of Turan’s theorem
If the graph G has n 6 p − 1 vertices it cannot contain any p–clique and thus has at most
n
2
edges. So in this case we only have to prove that
2
n(n − 1) 1 n
6 1− .
2 p−1 2
This of course is easy, we get
n−1 1 1
=1− 61− .
n n p−1
This is true since n 6 p − 1.
So now we assume that n > p and the set of vertices of G is denoted by V . If G has the
maximum number of edges possible without containing a p–clique it contains a p − 1–clique,
since otherwise we might add edges to get one. So we denote one such clique by A and define
B := G\A.
So A has p−1 2
edges. We are now interested in the number of edges in B, which we will call
eB , and in the number of edges connecting A and B, which will be called eA,B . By induction
we get:
1 1
eB 6 1− (n − p + 1)2 .
2 p−1
Since G does not contain any p–clique every vertice of B is connected to at most p − 2
vertices in A and thus we get:
66.22 realization
Let p1 , p2 , . . . be distinct points in R3 , the 3-dimensional Euclidean space, such that every
plane in R3 contains at most 3 of these points. Write (pi , pj ) for the straight line segment
427
with endpoints pi and pj (open or closed, as you like). Given a graph G = (V, E), V =
(x1 , x2 , . . . , xn ), the topological space
[ n
[[
R(G) = {(pi , pj ) : xi xj ∈ E} {pi } ⊂ R3
1
is said to be a realization of G.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
The size of a graph G is the number of edges in G; it is denoted by e(G). G(n, m) denotes
an arbitrary graph of order n and size m.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
66.24 subdivision
Thus, T G denotes any member of a large family of graphs; for example, T C4 is an arbitrary
cycle of length at least 4. For any graph G, the spaces R(G) (denoting the realization of G)
and R(T G) are homeomorphic.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
428
66.25 subgraph
If G0 contains all edges of G that join two vertices in V 0 then G0 is said to be the subgraph
induced or spanned by V 0 and is denoted by G[V 0 ]. Thus, a subgraph G0 of G is an
induced subgraph if G0 = G[V (G0 )]. If V 0 = V , then G0 is said to be a spanning subgraph
of G.
Often, new graphs are constructed from old ones by deleting or adding some vertices and
edges. If W ⊂ V (G), then G − W = G[V W] is the subgraph of G obtained by deleting
the vertices in W and all edges incident with them. Similarly, if E 0 ⊆ E(G), then
G − E 0 = (V (G), E(G) \ E 0 ). If W = w and E 0 = xy, then this notation is simplified to
G − w and G − xy. Similarly, if x and y are nonadjacent vertices of G, then G + xy is
obtained from G by joining x to y.
Adapted with permission of the author from Modern Graph Theory by Béla Bollobás,
published by Springer-Verlag New York, Inc., 1998.
The wheel graph of n vertices Wn is a graph that contains a cycle of length n − 1 plus a
vertex v (sometimes called the hub) not in the cycle such that v is connected to every other
vertex. The edges connecting v to the rest of the graph are sometimes called spokes.
W4 :
C B
429
W6 :
A
E F B
D C
430
Chapter 67
Let F be a Sperner family, that is, the collection of subsets of {1, 2, . . . , n} such that no set
contains any other subset. Then
X 1
n
6 1.
X∈F |X|
This inequality is known as LYM inequality by the names of three people that indepen-
dently discovered it: Lubell[2], Yamamoto[4], Meshalkin[3].
Since nk 6 bn/2c
n n
for every integer k, LYM inequality tells us that |F | / bn/2c 6 1 which is
Sperner’s theorem.
REFERENCES
1. Konrad Engel. Sperner theory, volume 65 of Encyclopedia of Mathematics and Its Applications.
Cambridge University Press. Zbl 0868.05001.
2. David Lubell. A short proof of Sperner’s lemma. J. Comb. Theory, 1:299, 1966. Zbl 0151.01503.
3. Lev D. Meshalkin. Generalization of Sperner’s theorem on the number of subsets of a finite set.
Teor. Veroyatn. Primen., 8:219–220, 1963. Zbl 0123.36303.
4. Koichi Yamamoto. Logarithmic order of free distributive lattice. J. Math. Soc. Japan, 6:343–
353, 1954. Zbl 0056.26301.
431
67.2 Sperner’s theorem
What is the size of the largest family F of subsets of an n-element set such that no A ∈ F
is a subset of B ∈ F? Sperner [3] gave an answer in the following elegant theorem:
n
Theorem 7. For every family F of incomparable subsets of an n-set, |F| 6 bn/2c .
A family satisfying the conditions of Sperner’s theorem is usually called Sperner family or
antichain. The later terminology stems from the fact that subsets of a finite set ordered by
inclusion form a Boolean lattice.
There are many generalizations of Sperner’s theorem. On one hand, there are refinements
like LYM inequality that strengthen the theorem in various ways. On the other hand, there
are generalizations to posets other than the Boolean lattice. For a comprehensive exposition
of the topic one should consult a well-written monograph by Engel[2].
REFERENCES
1. Béla Bollobás. Combinatorics: Set systems, hypergraphs, families of vectors, and combinatorial
probability. 1986. Zbl 0595.05001.
2. Konrad Engel. Sperner theory, volume 65 of Encyclopedia of Mathematics and Its Applications.
Cambridge University Press. Zbl 0868.05001.
3. Emanuel Sperner. Ein Satz über Untermengen einer endlichen Menge. Math. Z., 27:544–548,
1928. Available online at JFM.
432
Chapter 68
Repeated exponentiation for cardinals is denoted expi (κ), where i < ω. It is defined by:
exp0 (κ) = κ
and
That is, if f : [expi (κ)+ ]i+1 → κ then there is a homogenous set of size κ+ .
433
ω → (ω)nk
In words, if f is a function on sets of integers of size n whose range is finite then there is
some infinite X ⊆ ω such that f is constant on the subsets of X of size n.
1 if x = y 2 or y = x2
f ({x, y}) =
0 otherwise
Then let X ⊆ ω be the set of integers which are not perfect squares. This is clearly infinite,
and obviously if x, y ∈ X then neither x = y 2 nor y = x2 , so f is homogenous on X.
The original version of Ramsey’s theorem states that for every positive integers k1 and
k2 there is n such that if edges of a complete graph on n vertices are colored in two colors,
then there is either a k1 -clique of the first color or k2 -clique of the second color.
Similar argument shows that for any positive integers k1 , k2 , . . . , kt if we color the edges of a
sufficiently large graph in t colors, then we would be able to find either k1 -clique of color 1,
or k2 -clique or color 2,. . . , or kt -clique of color t.
The minimal n whose existence stated in Ramsey’s theorem is called Ramsey number
and denoted by R(k1 , k2 ) (and R(k1 , k2 , . . . , kt ) for multicolored graphs). The above proof
shows that R(k1 , k2 ) 6 R(k1 , k2 − 1) + R(k1 − 1, k2 ). From that it is not hard to deduce by
induction that R(k1 , k2 ) 6 k1k+k 2 −2
1 −1
. In the most interesting case k1 = k2 = k this yields
k
approximately R(k, k) 6 (4 + o(1)) . The lower bounds can be established by means of
probabilistic construction as follows.
Take a complete graph of size n and color its edges at random, choosing the color of each
edge uniformly independently of all other edges. The probability that any given set of k
434
vertices is a monochromatic clique is 21−k . Let Ir be the random variable which is 1 if
r’th set of k elements is monochromatic clique and is 0 otherwise. The sum Ir ’s over all
k-element sets is
Psimply Pthe number of monochromatic
k-cliques. Therefore by linearity of
expectation E( Ir ) = E(Ir ) = 21−k nk . If the expectation is less than 1, then there
exists a coloring which has no monochromatic cliques. A little exercise in calculus shows
that if we choose n√to be more than (2 + o(1))k/2 then the expectation is indeed less than 1.
Hence, R(k, k) > ( 2 + o(1))k .
The gap between the lower and upper bounds has been open for several decades.√ There
have been a number of improvements in o(1) terms, but nothing better than 2 + o(1) 6
R(k, k)1/k 6 4 + o(1) is known. It is not even known whether limk→∞ R(k, k)1/k exists.
The behavior of R(k, x) for fixed k and large x is equally mysterious. For this case Ajtai,
Komlós and Szemerédi[1] proved that R(k, x) 6 ck xk−1 /(ln)k−2 . The matching lower bound
has only recently been established for k = 3 by Kim [4]. Even in this case the asymptotics
is unknown. The combination of results of Kim and improvement of Ajtai, Komlós and
1
Szemerédi’s result by Shearer [6] yields 162 (1 + o(1))k 2 / log k 6 R(3, k) 6 (1 + o(1))k 2 / log k.
A lot of machine and human time has been spent trying to determine Ramsey numbers for
small k1 and k2 . An up-to-date summary of our knowledge about small Ramsey numbers
can be found in [5].
REFERENCES
1. Miklós Ajtai, János Komlós, and Endre Szemerédi. A note on Ramsey numbers. J. Combin.
Theory Ser. A, 29(3):354–360, 1980. Zbl 0455.05045.
2. Noga Alon and Joel H. Spencer. The probabilistic method. John Wiley & Sons, Inc., second
edition, 2000. Zbl 0996.05001.
3. Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer. Ramsey Theory. Wiley-
Interscience series in discrete mathematics. 1980. Zbl 0455.05002.
4. Jeong Han Kim. The Ramsey number R(3, t) has order of magnitude t2 / log t. Random Struc-
tures & Algorithms, 7(3):173–207, 1995. Zbl 0832.05084. Preprint is available online.
5. Stanislaw Radziszowski. Small Ramsey numbers. Electronic Journal of Combinatorics, Dy-
namical Survey, 2002. Available online.
6. James B. Shearer. A note on the independence number of triangle-free graphs. Discrete Math-
ematics, 46:83–87, 1983. Zbl 0516.05053. Abstract is available online
68.4 arrows
Let [X]α = {Y ⊆ X | |Y | = α}, that is, the set of subsets of X of α. Then given some
cardinals κ, λ, α and β
435
κ → (λ)αβ
States that for any set X of size κ and any function f : [X]α → β, there is some Y ⊆ X and
some γ ∈ β such that |Y | = λ and for any y ∈ [Y ]α , f (y) = γ.
As an example, the pigeonhole principle is the statement that if n is finite and k < n then:
n → 21k
That is, if you try to partition n into fewer than n pieces than one piece has more than one
element.
Observe that if
κ → (λ)αβ
κ 9 (λ)αβ
68.5 coloring
436
Any coloring provides a partition of X: for each y ∈ Y , f −1 (y), the set of elements x such
that f (x) = y, is one element of the partition. Since f is a function, the sets in the partition
are disjoint, and since it is a total function, their union is X.
ω → (ω)nk
is proven by induction on n.
If n = 1 then this just states that any partition of an infinite set into a finite number of
subsets must include an infinite set; that is, the union of a finite number of finite sets is
finite. This is simple enough to prove: since there are a finite number of sets, there is a
largest set of size x. Let the number of sets be y. Then the size of the union is no more than
xy.
If
ω → (ω)nk
ω → (ω)n+1
k
Then we define a sequence of integers hni ii∈ω and a sequence of infinite subsets of ω, hSi ii∈ω
by induction. Let n0 = 0 and let S0 = ω. Given ni and Si for i 6 j we can define Sj as an
infinite homogenous set for f ni : [Sj−1]n → k and nj as the least element of Sj .
S
Obviously N = {ni } is infinite, and it is also homogenous, since each ni is contained in Sj
for each j 6 i.
437
Chapter 69
Let S = {S1 , S2 , . . . Sn } be a finite collection of finite sets. There exists a system of distinct representatives
of S if and only if the following condition holds for any T ⊆ S:
[
T > |T |
This is known as Hall’s marriage theorem. The name arises from a particular application of
this theorem. Suppose we have a finite set of single men/women, and, for each man/woman,
a finite collection of women/men to whom this person is attracted. An SDR for this collection
would be a way each man/woman could be (theoretically) married happily. Hence, Hall’s
marriage theorem can be used to determine if this is possible.
An application of this theorem to graph theory gives that if G(V1 , V2 , E) is a bipartite graph,
then G has a complete matching that saturates every vertex of V1 if and only if |S| 6 |N(S)|
for every subset S ⊂ V1 .
438
Assuming the theorem true for all |S| < n, we prove it for |S| = n.
By the already-proven case of the theorem for S 0 we see that we can indeed pick an SDR for
S 0.
But
[ [ S S S
T0 \ T = | (T T 0 )| − | T | ≥ (69.2.1)
S
≥ |T T 0 | − |T | = (69.2.2)
= |T | + |T 0 | − |T | = |T 0 | , (69.2.3)
using the disjointness of T and T 0 . So by an already-proven
S case of the theorem, ST does
indeed have an SDR which avoids all elements of T .
QED.
439
69.3 saturate
x1 ∈ S1 , x2 ∈ S2 , . . . xn ∈ Sn
such that
xi 6= xj whenever i 6= j
(i.e., each choice must be unique).
440
Chapter 70
• n=1
– t1
• n=2
– t1 + t2
– t1 t2
• n=3
– t1 + t2 + t3
– t1 t2 + t2 t3 + t1 t3
– t1 t2 t3
We give here an algorithm for reducing a symmetric polynomial into a polynomial in the
elementary symmetric polynomials.
441
We define the height of a monomial xe11 · · · xenn in R[x1 , . . . , xn ] to be e1 + 2e2 + · · · + nen . The
height of a polynomial is defined to be the maximum height of any of its monomial terms,
or 0 if it is the zero polynomial.
442
Chapter 71
If a and b are related this way we say that they are equivalent under ∼. If a ∈ S, then the
set of all elements of S that are equivalent to a is called the equivalence class of a.
An equivalence relation on a set induces a partition on it, and also any partition induces
an equivalence relation. Equivalence relations are important, because often the set S can
be ’transformed’ into another set (quotient space) by considering each equivalence class as
a single unit.
1. Consider the set of integers 1. Z and take a positive integer m. Then m induces an
equivalence relation by a ∼ b when 1. m divides 1. b − a (that is, a and b leave the same
remainder when divided by m).
443
Version: 7 Owner: drini Author(s): drini
444
Chapter 72
72.1 join
Certain posets X have a binary operator join, denoted ∨ such that x∨y is the lowest upper bound
of x and y. Further, if j and j 0 are both joins of x and y, then j 6 j 0 and j 0 6 j, and so
j = j 0 ; thus a join, if it exists, is unique. Also known as the or operator.
72.2 meet
Certain posets X have a binary operator meet, denoted ∧ such that x∧y is the greatest lower bound
of x and y. Further, if m and m0 are both meets of x and y, then m 6 m0 and m > m0 , and
so m = m0 ; thus a meet, if it exists, is unique.
445
Chapter 73
A directed set is a partially ordered set (A, 6) such that whenever a, b ∈ A there is c ∈ A
such that a 6 c and b 6 c.
Note: Many authors do not require 6 to be antisymmetric, so that (A, 6) is only a suborder
with the above property.
73.2 infimum
The infimum of a set S is the greatest lower bound of S and is denoted inf(S).
Let A be a set with a partial order 6, and let S ⊆ A. For any x ∈ A, x is a lower bound
of S if x 6 y for any y ∈ S. The infimum of S, denothed inf(S), is the greatest such lower
bound; that is, if b is a lower bound of S, then b 6 inf(S).
Note that it is not necessarily the case that inf(S) ∈ S. Suppose S = (0, 1); then inf(S) = 0,
but 0 6∈ S.
Also note that a set does not necessarily have an infimum. See the attachments to this entry
for examples.
446
73.3 sets that do not have an infimum
• The set M1 := Q (as a subset of Q) does not have an infimum (nor a supremum).
Intuitively this is clear, as the set is unbounded. The (easy) formal proof is left as an
exercise for the reader.
Intuitively speaking, this example exploits the fact that Q does not have ”enough
elements”. More formally, Q √ as a metric space is not complete. 0 The M2 defined above
0
is the real interval M2 := (√ 2, ∞) ⊂ R intersected with Q. M2 as a subset of R does
have an infimum (namely 2), but as that is not an element of Q, M2 does not have
an infimum as a subset of Q.
This example also makes it clear that it is important to clearly state the superset one
is working in when using the notion of infimum or supremum.
It also illustrates that the supremum is a natural generalization of the minimum of a
set, as a set that does not have a minimum may still have an infimum (such as M20 ).
Of course all the ideas expressed here equally apply to the supremum, as the two
notions are completely analogous (just reverse all inequalities).
73.4 supremum
The supremum of a set S is the least upper bound of S and is denoted sup(S).
447
Let A be a set with a partial order 6, and let S ⊆ A. For any x ∈ A, x is an upper bound
of S if y 6 x for any y ∈ S. The supremum of S is the least such upper bound; that is, if b
is an upper bound of S, then sup(S) 6 b.
Note that it is not necessarily the case that sup(S) ∈ S. Suppose S = (0, 1); then sup(S) = 1,
but 1 6∈ S.
Note also that a set may not have an upper bound at all.
Let S be a set with an ordering relation 6, and let T be a subset of S. An upper bound for
T is an element z ∈ S such that x 6 z for all x ∈ T . We say that T is bounded from above
if there exists an upper bound for T .
Lower bound, and bounded from below are defined in a similar manner.
448
Chapter 74
06A99 – Miscellaneous
If (P, 6) is a poset then a subset Q ⊆ P is dense if for any p ∈ P there is some q ∈ Q such
that q 6 p.
A total order is a partial order that satisfies a fourth property known as comparability.
A set and a partial order on that set define a poset.
449
74.3 poset
A poset is a partially ordered set, that is, a poset is a pair (P, ≤) where ≤ is a partial order
relation on P .
A few examples:
• (Z, ≤) where ≤ is the common “less than or equal” relation on the integers.
• (P(X), ⊆). P(X) is the power set of X and the relation ⊆ given by the common
inclusion of sets.
In a partial order, not any two elements need to be comparable. As example, consider
X = {a, b, c} and the poset on its power set given by the inclusion. Here, {a} ⊆ {a, c}
but the two subsets {a, b} and {a, c} are not comparable. (Neither {a, b} ⊆ {a, c} nor
{a, c} ⊆ {a, b}.
74.4 quasi-order
[s]∼ 6 [t]∼ :⇔ s . t,
where [s]∼ and [t]∼ denote the equivalence classes of s and t. In particular, 6 does satisfy
antisymmetry, whereas . may not.
450
referred to as bad if for all i < j, ai 6- aj holds; otherwise it is called good. Note that an
antichain is obviously a bad sequence.
Proposition 3. Given a set Q and a binary relation - over Q, the following conditions are
equivalent:
• (Q, -) is a well-quasi-ordering;
• (Q, -) has no infinite (ω-) strictly decreasing chains and no infinite antichains.
• Every linear extension of Q/≈ is a well-order, where ≈ is the equivalence relation and
Q/≈ is the set of equivalence classes induced by ≈.
The equivalence of WQO to the second and the fourth conditions is proved by the infinite
version of Ramsey’s theorem.
451
Chapter 75
Examples:
1. The ring of integers in a number field is an order, known as the maximal order.
2. Let K be a quadratic imaginary field and O its ring of integers. Then for each integer
n, the ring Z + nO is an order of K (in fact it can be proved that every order of K is
of this form).
452
Chapter 76
A finite lattice L is modular if and only if it is graded and its rank function ρ satisfies
ρ(x) + ρ(y) = ρ(x ∧ y) + ρ(x ∨ y) for all x, y ∈ L.
453
Chapter 77
06D99 – Miscellaneous
77.1 distributive
(a + b) · c = (a · c) + (b · c) for all a, b, c ∈ S
a · (b + c) = (a · b) + (a · c) for all a, b, c ∈ S.
If · is both left and right distributive over +, then it is said to be distributive over +.
A lattice is said to be distributive if it satisifes either (and therefore both) of the distributive laws:
• x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)
• x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)
Examples of distributive lattices include Boolean lattices and totally ordered sets.
454
Chapter 78
06E99 – Miscellaneous
A boolean ring is a ring R that has a unit element, and in which every element is idempotent.
In other words,
x2 = x, ∀x ∈ R.
the four elements that form the ring are idempotent. So, R is Boolean.
455
Chapter 79
456
Chapter 80
08A99 – Miscellaneous
Rather than using function notation, it is usual to write binary operations with an opera-
tion symbol between elements, or even with no operation at all, it being understood that
juxtaposed elements are to be combined using an operation that should be clear from the
context.
(x, y) 7→ xy.
Definition 2. A filtered algebra over the field k is an algebra (A, ·) over k which is endowed
with a filtration F = {Fi }i∈N compatible with the multiplication in the following sense
∀m, n ∈ N, Fm · Fn ⊂ Fn+m .
A special case of filtered algebra is a graded algebra. In general there is the following
construction that produces a graded algebra out of a filtered algebra.
457
Definition 3. Let (A, ·, F) be a filtered algebra then the associated graded algebra G(A)
is defined as follows:
• As a vector space M
G(A) = Gn ,
n∈N
where,
G0 = F0 , and ∀n > 0, Gn = Fn /Fn−1 ,
(x + Fn )(y + Fm ) = x · y + Fn+m
Theorem 6. The multiplication is well defined and endows G(A) with the structure of a
graded algebra, with gradation {Gn }n∈N . Furthermore if A is associative then so is G(A).
As algebras A and G(A) are distinct (with the exception of the trivial case that A is graded)
but as vector spaces they are isomorphic.
458
Chapter 81
for any positive integer n, φ(n) is the number of positive integers less than or equal to n
which are coprime to n. This is known as the Euler φ-function. Among its useful properties
are the facts that φ is multiplicative, meaning if gcd(a, b) = 1, then φ(ab) = φ(a)φ(b), and
that φ(pk ) = pk−1(p − 1) if p is prime. These two facts combined give a numeric computation
of φ for all integers:
Y 1
φ(n) = n 1− .
p
p|n
For example,
φ(2000) = φ(24 · 53 )
1 1
= 2000(1 − )(1 − )
2 5
1 4
= 2000( )( )
2 5
8000
=
10
= 800.
In addition, X
φ(d) = n
d|n
where the sum extends over all positive divisors of n. Also, φ(n) is the number of units in
the ring Z/nZ of integers modulo n.
459
Version: 6 Owner: KimJ Author(s): KimJ
Given a, n ∈ Z, aφ(n) ⇔ 1 (mod n) when gcd(a, n) = 1, where φ is the Euler totient function.
Now, since all these numbers are different, the set {a, 2a, 3a, . . . , (p − 1)a} will have the p − 1
possible congruence classes (although not necessarily in the same order) and therefore
ap−1 ⇔ 1 (mod p)
The conjecture states that every even integer n > 2 is expressible as the sum of two primes.
In 1966 Chen proved that every sufficiently large even number can be expressed as the sum
of a prime and a number with at most two prime divisors.
460
Vinogradov proved that every sufficiently large odd number is a sum of three primes. In
1997 it was shown by J.-M. Deshouillers, G. Effinger, H. Te Riele, and D. Zinoviev that
assuming generalized Riemann hypothesis every odd number n > 5 can be represented as
sum of three primes.
The conjecture was first proposed in a 1742 letter from Christian Goldbach to Euler and
still remains unproved.
Y
Jk (n) = nk (1 − p−k )
p|n
Legendre Symbol.
Let p an odd prime. The symbol ap or (a | p) will has the value 1 if a is a quadratic residue
modulo p, −1 if a is not a quadratic residue, or 0 if p divides a. The symbol defined this
way is called Legendre symbol.
The Legendre symbol can be computed by means of Euler’s criterion or Gauss’ lemma.
461
81.8 Pythagorean triplet
a2 + b2 = c2 .
That is, {a, b, c} is a Pythagorean triplet if there exists a right triangle whose sides are a, b, c.
If {a, b, c} a Pythagorean triplet, so is {ka, kb, kc}. If a, b, c are coprimes, then we say that
the triplet is primitive.
a = 2mn
b = m2 − n2
c = m2 + n2
where m, n are any two coprime integers, one odd and the other even with m > n.
Arithmetic Mean.
If a1 , a2 , . . . , an are real numbers, we define the arithmetic mean of them as
a1 + a2 + · · · + an
A.M. = .
n
The arithmetic mean is what is commonly called the average of the numbers.
462
81.11 ceiling
The ceiling function is the smallest integer greater or equal than its argument. It is usually
denoted as dxe.
Some examples: d6.2e = 7, d0.4e = 1, d7e = 7, d−5.1e = −5, dπe = 4, d−4e = −4.
Note that this function is NOT the integer part ([x]), since d3.5e = 4 and [3.5] = 3.
This can be used to make the computation of large powers easier. It also allows one to find
−1
an easy to compute inverse to xb (mod n) whenever b ∈ U(n). In fact, this is just xb where
b−1 is an inverse to b mod φ(n). This forms the base of the RSA cryptosystem where a
message x is encrypted by raising it to the bth power, giving xb , and is decrypted by raising
it to the b−1 th power, giving
−1 −1
(xb )b ⇔ xbb ,
which, by the above argument, is just
−1
xbb % φ(n)
⇔ x,
463
81.13 congruences
If a and b are congruent modulo m, it means that both numbers leave the same residue when
divided by m.
Congruence with a fixed modulo is an equivalence relation on Z. The set of equivalence classes
is a cyclic group of order m with respect to the sum and a ring if we consider multiplication
modulo m. This ring is usually denoted as
Z
.
mZ
This ring is also commonly denoted as Zm , although that notation is also used to represent
the m-adic integers.
81.14 coprime
Two integers a, b are coprime if their greatest common divisor is 1. It is also said that a, b
are relatively prime.
The cube root operation is distributive for multiplication and division, but not for addition
and subtraction.
√ √ √ q √
3
That is, 3 x × y = 3 x × 3 y, and 3 xy = √3 xy .
464
√ √ √ √ √ √
However, in general, 3 x + y 6= 3 x + 3 y and 3 x − y 6= 3 x − 3 y.
p
Example: 3 x3 y 3 = xy because (xy)3 = xy × xy × xy = x3 × y 3 = x3 y 3 .
q 3
8 3
Example: 3 125 = 25 because 52 = 253 = 1258
.
√ 1
The cube root notation is actually an alternative to exponentiation. That√is, 3 x ⇔ x 3√. As
3 2
2 2
such, the cube root operation is associative with exponentiation. That is, x = x 3 = 3 x .
81.16 floor
The floor function is the greatest integer less or equal than its argument. It is usually
denoted as bxc.
Some examples: b6.2c = 6, b0.4c = 0, b7c = 7, b−5.1c = −6, bπc = 3, b−4c = −4.
Note that this function is NOT the integer part ([x]), since b−3.5c = −4 and [−3.5] = −3.
However, both functions agree for non negative numbers
On some texts however, it is sometimes seen the bracket notation to denote floor func-
tion (although they actually work with integer part) so it is sometimes also called bracket
function
Geometric Mean.
If a1 , a2 , . . . , an are real numbers, we define their geometric mean as
√
G.M. = n
a1 a2 · · · an
(We usually require the numbers to be non negative so the mean always exists.)
465
81.18 googol
A googol equals to the number 10100 , that is, a one followed by one hundred zeros. A
100
googolplex is is ten raised to the power of a googol, i.e., 10(10 ) .
Although these numbers do not have much use in traditional mathematics, they are useful
for illustrating what “big” can mean in mathematics. Written out in numbers a googol is
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
This is already a huge number. For instance, it is more than the number of atoms in the
known universe. A googoplex is even larger. In fact, since a googolplex has a googol number
of zeros in it’s decimal representation, a googoplex has more digits than there are atoms
in our universe. Thus, even if all matter in the universe were at disposal, it would not be
possible to write down the decimal representation of a googolplex [1].
Properties
1. A googol is approximately the factorial of 70. The only prime factors in a googol and
a googolplex are 2 and 5 [1].
2. Using Stirling’s formula we can approximate the factorial of a googol to obtain
101 )
(10100 )! ≈ 10(9.95 ·10 .
The googol was created by the American mathematician Edward Kasner (1878-1955) [4] in
[2] to illustrate the difference between an unimaginably large number and infinity. The name
’googol’ was coined by Kasner’s nine-year-old nephew Milton Sirotta in 1938 when asked to
give a name for a huge number. The name googol was perhaps influenced by the comic strip
character Barney Google [1, 3].
REFERENCES
1. Wikipedia’s entry on googol.
2. Kasner, Edward & Newman, James Roy,Mathematics and the Imagination, (New York,
NY, U SA: Simon and Schuster, 1967; Dover Pubns, April 2001; London: Penguin, 1940, ISBN
0486417034).
3. Douglas Harper’s Etymology online dictionary, Googol 12/2003.
4. Wikipedia’s entry on Edward Kasner.
466
81.19 googolplex
100
A googolplex is 1010 ; that is, 10 raised to the googol-th power. This can also be viewed as
a one followed by a googol zeros at its right.
Let a and b be given integers, with at least one of them different from zero. The greatest
common divisor of a and b, denoted by gcd(a, b), is the positive integer d satisfying
More intuitively, the greatest common divisor is the largest integer dividing both a and b.
467
If you travel from city A to city B at x miles per hour, and then you travel back at y miles
per hour. What was the average velocity for the whole trip?
The harmomic mean of x and y!. That is, the average velocity is
2 2xy
1 1 =
x
+y x+y
81.23 mean
A mean is a homogeneous function f whose domain is the collection of all finite multisets of
R and whose codomain is R, such that for any set S = {x1 , x2 , x3 , . . . , xt } of real numbers,
min{x1 , x2 , x3 , . . . , xt } 6 f (S) 6 max{x1 , x2 , x3 , . . . , xt }.
Pythagoras identified three types of means: the arithmetic mean, the geometric mean,
and the harmonic mean. Other well-known means include: the median, the mode, the
arithmetic-geometric mean, the arithmetic-harmonic mean, the harmonic-geometric mean,
the root-mean-square (sometimes called the quadratic mean), the identric mean, the Hero-
nian mean, and the Cesro mean. Even the minimum and maximum functions are means
(though vacuously so).
It should be noted that the arithmetic mean is sometimes simply referred to as the ”mean.”
A field which is a finite extension of Q, the rational numbers, is called a number field.
81.25 pi
The number π is the ratio between the perimeter and the diameter on any given circle.
That is, in any circle, dividing the perimeter by the diameter always gives the same answer:
3.14159265358...
468
Over human history there were many attempts to calculate this number precisely. One of the
oldest approximations appears in the Rhind Papyrus (circa 1650 B.C.) where a geometrical
construction is given where (16/9)2 = 3.1604 . . . is used as an approximation to π although
this was not explicitly mentioned.
It wasn’t until the Greeks that there were systematical attemps to calculate π. Archimedes
[1], in the third century B.C. used regular polygons inscribed and circumscribed to a circle
to approximate π: the more sides a polygon has, the closer to the circle it becomes and
therefore the ratio between the polygon’s area between the square of the radius yields ap-
proximations to π. Using this method he showed that 223/71 < π < 22/7 (3.140845 . . . <
π < 3.142857 . . .).
Aroud
√ the world there were also attemps to calculate π. Brahmagupta [1] gave the value
of 10 = 3.16227 . . . using a method similar to Archimedes’. Chinese mathematician Tsu
Chung-Chih (CA. 500 A.D.) gave the approximation 355/113 = 3.141592920 . . ..
Later, during the renaissance, Leonardo de Pisa (Fibonacci) [1] used 96-sideed regular poly-
gons to find the approximation 864/275 = 3.141818 . . .
For centuries, variations on Archimedes’ method were the only tool known, but Viète [1]
gave in 1593 the formula
v
r s r uu
s r
2 1 1 1 1 t1 1 1 1 1
= + + + ···
π 2 2 2 2 2 2 2 2 2
which was the first analytical expression for π involving infinite summations or products.
Later with the advent of calculus many of these formulas were discovered. Some examples
are Wallis’ [1] formula:
π 2 2 4 4 6
= · · · · ···
2 1 3 3 5 5
and Leibniz’s formula,
π 1 1 1 1 1
=1− + − + − +··· ,
4 3 5 7 9 11
obtained by developing arctan(π/4) using power series, and with some more advanced tech-
niques, p
π = 6ζ(2),
found by determining the value of the Riemann Zeta function at s = 2.
The Leibniz expression provides an alternate way to define π (namely 4 times the limit of
the series) and it is one of the formal ways to define π when studying analysis in order to
avoid the geometrical definition.
It is known that π is not a rational number (quotient of two integers). Moreover, π is not
algebraic over the rationals (that is, it is a transcendental number). This means that no
polynomial with rational coefficients can have π as a root. Its irrationality implies that its
decimal expansion (or any integer base for that matter) is not finite nor periodic.
469
REFERENCES
1. The MacTutor History of Mathematics archive, The MacTutor History of Mathematics Archive
2. The ubiquitous π [Dario Castellanos] Mathematics Magazine Vol 61, No. 2. April 1988.
Mathematical Association of America.
Since we are working (mod#1), all the numbers 2, . . . , p − 2 have inverses (mod#1), and
each can be paired with its inverse within 2, . . . , p −2. This leaves 1, which is its own inverse,
and p − 1, also its own inverse. Hence we can write
but then
(p − 1)! ⇔ (p − 1) ⇔ −1 (mod p)
Hence
(p − 1)! ⇔ −1 (mod p)
470
81.27 proof of fundamental theorem of arithmetic
n = p1 p2 · · · pr = q1 q2 · · · qs .
Assume without loss of generality that r 6 s, and that our primes are written in increasing
magnitude, so p1 6 · · · 6 pr , q1 6 · · · 6 qs . Since p1 |q1 q2 · · · qs and the qi are prime, we must
have p1 = qk for some k, but then p1 > q1 . The same reasoning gives q1 > p1 , so p1 = q1 .
Cancel this common factor to get
p2 p3 · · · pr = q2 q3 · · · qs .
1 = qr+1 qr+2 · · · qs .
But the qi were assumed to be > 1, so they must be equal to 1. Then r = s and pi = qi for
all i, making the two factorizations identical.
The fundamental theorem of algebra assures us that the polynomial xn − 1 = 0 has n roots
on C. That is, there exist n complex numbers z such that z n = 1. These numbers are called
roots of unity.
If ζ = e2πi/n = cos(2π/n) + i sin(2π/n), then the n-th roots of the unity are: ζ k = e2πki/n =
cos(2πk/n) + i sin(2πk/n) for k = 1, 2, . . . , n.
If drawed on the complex plane, the n-th roots of unity are the vertices of a regular n-gon.
471
Chapter 82
82.1 base
Most 1 written number systems are built upon the concept of base for their functioning and
conveying of quantitative meaning. In these systems, meaning is derived from two things:
symbols and places. The representation of a value then follows the schema:
Where each si is some symbol that has a quantitative value. Places to the left of the point
(.) are worth whole units, and places to the right are worth fractional units. It is the base
that tells us how much of a fraction or how many whole units. Once a base b is chosen, the
value of a number s2 s1 s0 .s−1 s−2 s−3 would be calculated like:
In our now-standard, Arabic-derived decimal system, the base b is equal to 10. Other very
common and useful systems are binary, hexadecimal, and octal, having b = 2, b = 16, and
b = 8 respectively 2 .
Each si is a member of an alphabet of symbols which must have b members. Intuitively this
1
but not all– see Roman numerals for an example of a baseless number system.
2
These are generic systems which are capable of representing any number. By contrast, our system of
written time is a curious hybrid of bases (60, 60, and then 10 from there on) and has a fixed number of whole
places and a different number of symbols (24) in the highest place, making it capable only of representing
the same discrete, finite set of values over and over again.
472
makes sense: when we try to represent the number which follows “9” in the decimal system,
we know it must be “10”, since there is no symbol after “9.” Hence, place as well as symbol
conveys the meaning, and base tells us how much a unit in each place is worth.
Curiously, though one would think that the choice of base leads to merely a different way
of rendering the same information, there are instances where things are variously provable
or proven in some bases, but not others. For instance, there exists a non-recursive formula
for the nth binary digit of π, but not for decimal– one still must calculate all of the n − 1
preceding decimal digits of π to get the nth (see this paper).
473
Chapter 83
f (x) = X r + a1 X r − 1 + ... + ar
where the a’s are integers, such that the absolute value of the product of those roots of f
which lie outside the unit circle, lies between 1 and 1 + m
”This problem, in interest in itself, is especially important for our purposes. Whether or not
the problem has a solution for m ¡ 0.176 we do not know.” – Derrick Henry Lehmer, 1933.
We define Mahler’s measure of a polynomial f to be the absolute value of the product of those
roots of f which lie outside the unit circle, multiplied by the absolute value of the coefficient
of the leading term of f. We shall denote it M(f).
Lehmer’s conjecture states that there exists a constant C ¿ 1 such that every polynomial f
with integer coefficients and M(f) ¿ 1 has M(f) C.
Theorem: There exist infinitely many odd integers k such that k2n + 1 is composite for
every n > 1.
474
A multiplier k with this property is called a Sierpinski number. The Sierpinski problem
consists in determining the smallest Sierpinski number. In 1962, John Selfridge discovered
the Sierpinski number k = 78557, which is now believed to be in fact the smallest such
number.
Conjecture: The integer k = 78557 is the smallest Sierpinski number. To prove the
conjecture, it would be sufficient to exhibit a prime k2n + 1 for each k < 78557.
475
Chapter 84
Let a, b be integers, not both zero. Then there exist two integers x, y such that:
ax + by = gcd(a, b).
This does not only work on Z but on every integral domain where an Euclidean valuation
has been defined.
Euclid’s algorithm describes a procedure for finding the greatest common divisor of two
integers.
Suppose a, b ∈ Z, and without loss of generality, b > 0, because if b < 0, then gcd(a, b) =
gcd(a, |b|), and if b = 0, then gcd(a, b) = |a|. Put d := gcd(a, b).
By the division algorithm for integers, we may find integers q and r such that a = q0 b + r0
where 0 6 r0 < b.
476
So we may repeat the division, this time with b and r0 . Proceeding recursively, we obtain
a = q0 b + r0 with 0 6 r0 < b
b = q1 r0 + r1 with 0 6 r1 < r0
r0 = q2 r1 + r2 with 0 6 r2 < r1
r1 = q3 r2 + r3 with 0 6 r3 < r2
..
.
Thus we obtain a decreasing sequence of nonnegative integers b > r0 > r1 > r2 > . . . ,
which must eventually reach zero, that is to say, rn = 0 for some n, and the algorithm
terminates. We may easily generalize the previous argument to show that d = gcd(rk−1 , rk ) =
gcd(rk , rk+1) for k = 0, 1, 2, . . . , where r−1 = b. Therefore,
More colloquially, the greatest common divisor is the last nonzero remainder in the algorithm.
The algorithm provides a bit more than this. It also yields a way to express the d as a
linear combination of a and b, a fact obscurely known as Bezout’s lemma. For we have that
a − q0 b = r0
b − q1 r0 = r1
r0 − q2 r1 = r2
r1 − q3 r2 = r3
..
.
rn−3 − qn−1 rn−2 = rn−1
rn−2 = qn rn−1
b − q1 (a − q0 b) = k1 a + l1 b = r1
(a − q0 b) − q2 (k1 a + l1 b) = k2 a + l2 b = r2
(k1 a + l1 b) − q3 (k2 a + l2 b) = k3 a + l3 b = r3
.. ..
. .
(kn−3 a + ln−3 b) − qn (kn−2 a + ln−2 b) = kn−1 a + ln−1 b = rn−1
Sometimes, especially for manual computations, it is preferrable to write all the algorithm
in a tabular format. As an example, let us apply the algorithm to a = 756 and b = 595.
The following table details the procedure. The variables at the top of each column (without
subscripts) have the same meaning as above. That is to say, r is used for the sequence of
remainders and q for the corresponding sequence of quotients. The entries in the k and l
477
r q k l
756 1 0
595 1 0 1
161 3 1 −1
112 1 −3 4
49 2 4 −5
14 3 −11 14
7 2 37 −47
0
columns are obtained by multiplying the current values for k and l by the q in this row, and
subtracting the results from the k and l in the previous row.
Euclid’s algorithm was first described in his classic work Elements, which also contained
procedures for geometrical constructions. These are the first known formally described algo-
rithms. Prior to this, informally defined algorithms were in common use to perform various
computations, but Elements contained the first attempt to rigorously describe a procedure
and explain why its results are admissable. Euclid’s algorithm for greatest common divisor
is still commonly used today; since Elements was published in the fourth century BC, this
algorithm has been in use for nearly 2400 years!
bc
=n
a
But gcd(a, b) = 1 implies b/a is only an integer if a = 1. So
bc c
=b =n
a a
478
which means a must divide c.
Each natural number n > 1 can be decomposed uniquely, up to the order of the factors, as
a product of prime numbers. This allows us to write n in the unique representation
n = p1 a1 p2 a2 p3 a3 · · · pk ak
for some nonnegative integer k with pi prime and pi 6= pj for i 6= j. For some results it is
also useful to assume that pi < pj for i < j.
An integer n is called perfect if it is the sum of all divisors of n less than n itself. It is
not known if there are any odd perfect numbers, but all even perfect numbers have been
classified as follows:
If 2k − 1 is prime for some k > 1, then 2k−1(2k − 1) is perfect, and every even perfect
number is of this form.
Proof: (⇒) Let p = 2k − 1 be prime, let n = 2k−1 p, and define σ(a) as the sum of all
positive divisors of the integer a. Since σ is multiplicative (meaning σ(ab) = σ(a)σ(b) when
gcd(a, b) = 1), we have:
σ(n) = σ(2k−1 p)
= σ(2k−1 )σ(p)
= (2k − 1)(p + 1)
= (2k − 1)(2k ) = 2n,
(⇐) Assume n is an even perfect number. Write n = 2k−1 m for some odd m and k > 2.
Then we have gcd(2k−1, m) = 1, so
479
But if n is perfect, then by definition σ(n) = 2n which in our case means σ(n) = 2n = 2k m.
Piecing together our two formulae for σ(n), we get
2k m = (2k − 1)σ(m).
So (2k − 1)|2k m, which forces (2k − 1)|m. Write m = (2k − 1)M. So from above we have:
2k m = (2k − 1)σ(m)
2k (2k − 1)M = (2k − 1)σ(m)
2k M = σ(m)
2k M = σ(m) > m + M = 2k M,
which forces σ(m) = m + M. Thus m has only two divisors, m and M. Hence m must be
prime, M = 1, and m = (2k − 1)M = 2k − 1, from which the result follows.
480
Chapter 85
For every n ∈ N (n!)p stands for the product of numbers between 1 and n which are not
divisible by a given prime p. And we set (0!)p = 1.
The corollary below generalizes a result first found by Anton, Stickelberger, and Hensel:
Let N0 be the least non-negative residue of n (mod ps ) where p is a prime number and
n ∈ N. Then bn/ps c
+
(n!)p ⇔ − 1 · (N0 !)p (mod ps ).
481
Version: 1 Owner: Thomas Heye Author(s): Thomas Heye
We must show
ap−1 ⇔ 1 (mod p)
with p prime, and p - a.
When a = 1, we have
1p−1 ⇔ 1 (mod p)
Now assume the theorem holds for any a. We have as a direct consequence that
ap ⇔ a (mod p)
p p p p p−1 p
(a + 1) ⇔ a + a +···+ a+1
0 1 p−1
⇔ a + pap−1 + p(p − 1)ap−2 + · · · + p(p − 1) · · · 2a + 1
⇔ (a + 1) + [pap−1 + p(p − 1)ap−2 + · · · + p(p − 1) · · · 2a]
However, note that the entire bracketed term is divisible by p, since each element of it is
divisble by p. Hence
(a + 1)p ⇔ (a + 1) (mod p)
(a + 1)p−1 ⇔ 1 (mod p)
482
85.3 Jacobi symbol
Jacobi Symbol.
Let n be an odd positive integer,
with prime factorization p1 e1 · · · pk ek . Let a > 0 be an
integer. The Jacobi symbol na is defined to be
a Y k ei
a
=
n i=1
pi
where pai is the Legendre symbol of a and pi .
First find positive integers Q and S such that p − 1 = 2S Q, where Q is odd. Then find a
quadratic nonresidue W of p and compute V ⇔ W Q (mod p). Then find an integer n0 that
is the multiplicative inverse of n (mod p) (i.e., nn0 ⇔ 1 (mod p)).
Compute
Q+1
R⇔n 2 (mod p)
and find the smallest integer i > 0 that satisfies
i
(R2 n0 )2 ⇔ 1 (mod p)
If i = 0, then x = R, and the algorithm stops. Otherwise, compute
(S−i−1)
R0 ⇔ RV 2 (mod p)
and repeat the procedure for R = R0 .
By Fermat’s little theorem the relationship p | 2p − 1 holds for any odd prime p. An odd
prime p such that p2 - 2p − 1 is called a Wieferich prime. It is currently unknown whether or
not there are infinitely many Wieferich primes, or whether or not there are infinitely many
primes that are not Wieferich, though the ABC conjecture implies the former.
483
REFERENCES
1. Ireland, Kenneth and Rosen, Michael. A Classical Introduction to Modern Number Theory.
Springer, 1998.
2. Nathanson, Melvyn B. Elementary Methods in Number Theory. Springer, 2000.
For every natural number n, let (n!)p denote the product of numbers 1 ≤ m ≤ n with
gcd(m, p) = 1.
Proof: We pair up all factors of the product (ps !)p into those numbers m where m 6⇔ m−1
(mod ps ) and those where this is not the case. So (ps !)p is congruent (modulo ps ) to the
product of those numbers m where m ⇔ m−1 (mod ps ) ↔ m2 ⇔ 1 (mod ps ).
Let p be an odd prime and s ∈ N. Since 2 6 |ps , ps |(m2 − 1) implies ps |(m + 1) either or
ps |(m − 1). This leads to
(ps !)p ⇔ −1 (mod ps )
for odd prime p and any s ∈ N.
Since
2s−1 + 1 2s−1 − 1 ⇔ −1 (mod 2s ),
we have
(ps !)p ⇔ (−1).(−1) = 1 (mod ps )
For p = 2, s ≥ 3, but −1 for s = 1, 2.
484
85.7 factorial module prime powers
For n ∈ N and prime number p, (n!)p is the product of numbers 1 ≤ m ≤ n|p 6 |m.
j k
where Ni is the least non-negative residue of pni (mod ps ). d + 1 denotes the number of
digits in the p-adic representation of n. More preciesly, ±1 is −1 unless p = 2, s ≥ 3.
j k
Proof: Let i ≥ 0. Then the set of numbers between 1 and pni is
n
kp, k ≥ 1, k ≤ .
pi+1
Multiplying all terms with 0 ≤ i ≤ d, where d is the largest power of p not greater than n,
the statement follows from the generalization of Anton’s congruence.
Let a1 , a2 , . . . , aφ(n) be all positive integers less than n which are coprime to n. Since
gcd(a, n) = 1, then the set aa1 , aa2 , . . . , aaφ(n) are each congruent to one of the integers
a1 , a2 , . . . , aφ(n) in some order. Taking the product of these congruences, we get
hence
aφ(n) (a1 a2 · · · aφ(n) ) ⇔ a1 a2 · · · aφ(n) (mod n).
Since gcd(a1 a2 · · · aφ(n) , n) = 1, we can divide both sides by a1 a2 · · · aφ(n) , and the desired
result follows.
485
85.9 proof of Lucas’s theorem
(n!)p ⇔ (−1)b p c · a0 !
n
(mod p), (85.9.1)
where a0 as defined above, and (n!)p is the product of numbers ≤ n not divisible by p. So
we have
n!
j k n = (n!)p ⇔ (−1)b p c · a0 ! (mod p)
n
n
p
!pb p c
When dividing by the left-hand terms of the congruences for m and r, we see that the power
of p is
n m r
− − = c0
p p p
So we get the congruence
n
m c0 a0
n
⇔ (−1) · ,
bpc c0 b0
· p
b mp c
or equivalently j k
c0 n
−1 n a0 p
· ⇔ j k (mod p). (85.9.2)
p m b0 m
p
486
Chapter 86
Let p be an odd prime and n an integer such that (n, p) = 1 (that is, n and p are
relatively prime).
Proposition 1: Let p be an odd prime and let n be an integer which is not a multiple of p.
Let u be the number of elements of the set
p−1
n, 2n, 3n, . . . , n
2
whose least positive residues, modulo p, are greater than p/2. Then
n
= (−1)u
p
where np is the Legendre symbol.
That is, n is a quadratic residue modulo p when u is even and it is a quadratic nonresidue
when u is odd.
487
GL is the special case
p−1
S= 1, 2, . . . ,
2
of the slightly more general statement below. Write Fp for the field of p elements, and
identify Fp with the set {0, 1, . . . , p − 1}, with its addition and multiplication mod p.
Proposition 2: Let S be a subset of Fp∗ such that x ∈ S or −x ∈ S, but not both, for any
x ∈ Fp∗ . For n ∈ Fp∗ let u(n) be the number of elements k ∈ S such that kn ∈
/ S. Then
n
= (−1)u(n) .
p
Proof: If a and b are distinct elements of S, we cannot have an = ±bn, in view of the
hypothesis on S. Therefore Y Y
an = (−1)u(n) a.
a∈S a∈S
by Euler’s criterion. So Y
n Y
a = (−1)u(n) a
p a∈S a∈S
Remarks: Using GL, it is straightforward to prove that for any odd prime p:
(
−1 1 if x ⇔ 1 (mod 4)
=
p −1 if x ⇔ −1 (mod 4)
(
2 1 if x ⇔ ±1 (mod 8)
=
p −1 if x ⇔ ±3 (mod 8)
The condition on S can also be stated like this: for any square x2 ∈ Fp∗ , there is a unique
y ∈ S such that x2 = y 2. Apart from the usual choice
S = {1, 2, . . . , (p − 1)/2} ,
the set
{2, 4, . . . , p − 1}
has also been used, notably by Eisenstein. I think it was also Eisenstein who gave us this
trigonometric identity, which is closely related to GL:
Y
n sin (2πan/p)
=
p a∈S
sin (2πa/p)
488
It is possible to prove GL or Proposition 2 “from scratch”, without leaning on Euler’s cri-
terion, the existence of a primitive root, or the fact that a polynomial over Fp has no more
zeros than its degree.
We will identify the ring Zn of integers modulo n, with the set {0, 1, . . . n − 1}.
m
lemma 1: (Zolotarev) For any prime number p and any m ∈ Z∗p , the Legendre symbol p
is equal to the signature of the permutation τm : x 7→ mx of Z∗p .
Proof: Let us write (σ) for the signature of any permutation σ. If σ is a circular permu-
tation on a set of k elements, then (σ) = (−1)k−1 .
Let i be the order of m in Z∗p . Then the permutation τm consists of (p − 1)/i orbits, each of
size i, whence
(τm ) = (−1)(i−1)(p−1)/i
If i is even, then
i p−1 p−1
m(p−1)/2 = m 2 i = (−1) i = (τm )
And if i is odd, then 2i divides p − 1, so
p−1
m(p−1)/2 = mi 2i = 1 = (τm ).
In both cases, the lemma follows from Euler’s criterion.
m
Lemma 1 extends easily from the Legendre symbol to the Jacobi Symbol n
for odd n.
The following is Zolotarev’s penetrating proof of the quadratic reciprocity law, using Lemma
1.
489
and if m and n are both odd,
(λ) = (−1)(m−1)(n−1)/4 .
Proof: We will use the fact that the signature of a permutation of a finite totally ordered set
is determined by the number of inversions of that permutation. The sequence (0, 0), (0, 1) . . .
defines on Amn a total order ≤ in which the relation (i, j) < (i0 , j 0 ) means
The only pairs ((i, j), (i0 , j 0 )) that get inverted are, therefore, the ones with i < i0 and j > j 0 .
There are indeed m2 n2 such pairs, proving the first formula, and the second follows easily.
Now let p and q be distinct odd primes. Denote by π the canonical ring isomorphism
Zpq → Zp × Zq . Define two permutations α and β of Zp × Zq by
α(x, y) = (qx + y, y)
β(x, y) = (x, x + py).
λ(x + qy) = px + y
We have
π(qx + y) = (qx + y, y)
π(x + py) = (x, x + py)
and therefore
π ◦ λ ◦ π −1 ◦ α = β.
Let us compare the signatures of the two sides. The permutation m 7→ qx + y is the
composition of m 7→ qx and m 7→ m + y. The latter has signature 1, whence by Lemma 1,
q
q q
(α) = =
p p
and similarly p
p p
(β) = = .
q q
By Lemma 2,
(π ◦ λ ◦ π −1 ) = (−1)(p−1)(q−1)/4 .
490
Thus
(p−1)(q−1)/4 q p
(−1) =
p q
which is the quadratic reciprocity law.
Reference
In a ring Z/nZ, a cubic residue is just a value of the function x3 for some invertible element
x of the ring. Cubic residues display a reciprocity phenomenon similar to that seen with
quadratic residues. But we need some preparation in order to state the cubic reciprocity
law.
√
ω will denote −1+i
2
3
, which is one of the complex cube roots of 1. K will denote the ring
K = Z[ω]. The elements of K are the complex numbers a + bω where a and b are integers.
We define the norm N : K → Z by
N(a + bω) = a2 + ab + b2
or equivalently
N(z) = zz .
Whereas Z has only two units (meaning invertible elements), namely ±1, K has six, namely
all the sixth roots of 1:
±1 ±ω ± ω2
and we know ω 2 = −1 − ω. Two nonzero elements α and β of K are said to be associates
if α = βµ for some unit µ. This is an equivalence relation, and any nonzero element has six
associates.
– positive real integers – q ⇔ 2 (mod 3) which are – prime in Z. Such integers are called
rational primes in K.
491
For example, 3 + 2ω is a prime in K because its norm, 19, is prime in Z and is 1 mod 3; but
19 is not a prime in K.
Now we need some convention whereby at most one of any six associates is called a prime.
By convention, the following numbers are nominated:
– the number π.
a ⇔ 2 (mod 3)
b ⇔ 0 (mod 3) .
αN (ρ)−1 ⇔ 1 (mod ρ) .
χρ : K → {1, ω, ω 2}
by
χρ is a character, called the cubic residue character mod ρ. We have χρ (α) = 1 if and only
if α is a nonzero cube mod ρ. (Compare Euler’s criterion.)
Theorem (Cubic Reciprocity Law): If ρ and σ are any two distinct primes in K, neither
of them π, then
χρ (σ) = χσ (ρ) .
−1
The quadratic reciprocity law has two “supplements” which describe p
and 2p . Like-
wise the cubic law has this supplement, due to Eisenstein:
χρ (π) = ω 2m
492
where
m = (ρ + 1)/3 if ρ is a rational prime
m = (a + 1)/3 if ρ = a + bω is a complex prime.
The quadratic reciprocity law would take a simpler form if we were to make a different
convention on what is a prime in Z, a convention similar to the one in K: a prime in Z is
either 2 or an irreducible element x of Z such that x ⇔ 1 (mod 4). The primes would then
be 2, -3, 5, -7, -11, 13, . . . and the QRL would say simply
p q
=1
q p
for any two distinct odd primes p and q.
(All congruences are modulo p for the proof; omitted for clarity.)
Let
x = a(p−1)/2
Proof (a) partition the set { 1, . . . , p − 1 } into pairs { c, d } such thatcd ⇔ a. Then
c and d must always be distinct since a is a non-residue. Hence, the product of
the union of the partitions is:
(p − 1)! ⇔ a(p−1)/2 ⇔ −1
and the result follows by Wilson’s theorem.
493
Proof (b) The equation:
z (p−1)/2 ⇔ 1
has at most (p − 1)/2 roots. But we already know of (p − 1)/2 distinct roots of
the above equation, these being the quadratic residues modulo p. So a can’t be a
root, yet a(p−1)/2 ⇔ ±1. Thus we must have:
a(p−1)/2 ⇔ −1
QED.
Theorem:
(Gauss)
Let p and q be distinct odd primes, and write p = 2a + 1 and q = 2b + 1.
p q
Then q p
= (−1)ab .
v
( w
is the Legendre symbol.)
P has ab + b elements in the region y > 0, namely f (m) for all m of the form k + lq with
1 ≤ k ≤ b and 0 ≤ l ≤ a. Thus
N0 + N1 = ab + b − (b − v) + u
494
i.e.
N0 + N1 = ab + u + v. (86.6.1)
N0 + N2 = ab + u + v. (86.6.2)
Furthermore, for any s ∈ S, if f (s) = (x, y) then f (−s) = (−x, −y). It follows that for any
(x, y) ∈ R other than (0, 0), either (x, y) or (−x, −y) is in P , but not both. Therefore
N1 + N2 = ab + u + v. (86.6.3)
0 ⇔ ab + u + v (mod 2)
so
(−1)ab = (−1)u (−1)v
which, in view of Gauss’s lemma, is the desired conclusion.
For a bibliography of the more than 200 known proofs of the QRL, see Lemmermeyer .
But there is another way, which goes back to Euler, and is worth seeing, inasmuch as it is
the prototype of certain more general arguments about character sums.
Let σ be a primitive eighth root of unity in an algebraic closure of Z/pZ, and write τ =
σ + σ −1 . We have σ 4 = −1, whence σ 2 + σ −2 = 0, whence
τ2 = 2 .
τ p = σ p + σ −p .
495
If p ⇔ ±1 (mod 8), this implies τ p = τ . If p ⇔±3 (mod 8), we get instead τ p = σ 5 + σ −5 =
−σ −1 − σ = −τ . In both cases, we get τ p−1 = p2 , proving (1) and (2).
σ = exp(2πi/8)
τ = σ + σ −1
Both are algebraic integers. Arguing much as above, we end up with
p−1 2
τ ⇔ (mod p)
p
which is enough.
Let F be a finite field of characteristic p, and let f and g be distinct monic irreducible
(non-constant) polynomials in the polynomial ring F [X]. Define the Legendre symbol fg
by (
f 1 if f is a square in the quotient ring F [X]/(g),
:=
g −1 otherwise.
The quadratic reciprocity theorem for polynomials over a finite field states that
f g p−1
= (−1) 2 deg(f ) deg(g) .
g f
REFERENCES
1. Feng, Ke Qin and Ying, Linsheng, An elementary proof of the law of quadratic reciprocity in
Fq (T ). Sichuan Daxue Xuebao 26 (1989), Special Issue, 36–40.
2. Merrill, Kathy D. and Walling, Lynne H., On quadratic reciprocity over function fields. Pacific
J. Math. 173 (1996), no. 1, 147–150.
496
86.9 quadratic reciprocity rule
q p
= (−1)(p−1)(q−1)/4
p q
where ·· is the Legendre symbol, p and q are odd and prime, and at least one of p or q is
positive.
x2 ⇔ a (mod n)
497
Chapter 87
for any proper divisor m0|m, where the first map is the natural mapping and the second map
is a character mod m0. If χ is non-primitive, the gcd of all such m’ is called the conductor
of χ.
The Liouville function is defined by λ(1) = 1 and λ(n) = (−1)k1 +k2 +···+kr , if the prime
factorization of n > 1 is n = pk11 pk22 · · · pkr r . This function is multiplicative and satisfies the
498
identity (
X 1 if n = m2 for some integer m
λ(d) =
d|n
0 otherwise.
P P
The Moebius inversion formula leads to the identity Λ(n) = d|n µ(n/d) ln d = − d|n µ(d) ln d.
Moreover, the term O(1) arising in this formula lies in the open interval (−1 − ln 4, ln 4).
499
In other words, µ(n) = 0 if n is not a square-free integer, while µ(n) = (−1)r if n is square-free
with r prime factors. The function µ is a multiplicative function, and obeys the identity
(
X 1 if n = 1
µ(d) =
d|n
0 if n > 1
For any integer n ≥ 1, let µ(n) be 0 if n is divisible by the square of a prime number, and if
not, let µ(n) = (−1)k(n) where k(n) is the number of primes which divide n. The resulting
function µ : N∗ → {−1, 0, 1} is called the Möbius µ function, or just the Möbius function.
Proof: By induction, there can only be one function with these properties. µ clearly satisfies
(1), so take some n > 1. Let p be some prime factor of n, and let m be the product of all
the prime factors of n.
X X
µ(d) = µ(d)
d|n d|m
X X
= µ(d) + µ(d)
d|m d|m
p-d p|d
X X
= µ(d) − µ(d)
d|m/p d|m/p
= 0
Proposition 2: Let f and g be two mappings of N∗ into some given additive group. The
conditions
X
f (n) = g(d) for all n ∈ N∗ (87.6.3)
d|n
X n
g(n) = µ(d)f for all n ∈ N∗ (87.6.4)
d
d|n
500
are equivalent.
= g(n) by Proposition 1
= f (n) by Proposition 1
as claimed.
Möbius-Rota inversion
G.-C. Rota has described a generalization of the Möbius formalism. In it, the set N∗ , ordered
by the relation x|y between elements x and y, is replaced by a more general ordered set, and
µ is replaced by a function of two variables.
Let (S, ≤) be a locally finite ordered set, i.e. an ordered set such that {z ∈ S|x ≤ z ≤ y} is
a finite set for all x, y ∈ S. Let A be the set of functions α : S × S → Z such that
α(x, x) = 1 for all x ∈ S (87.6.5)
α(x, y) 6= 0 implies x 6 y (87.6.6)
A becomes a monoid if we define the product of any two of its elements, say α and β, by
X
(αβ)(x, y) = α(x, t)β(t, y).
t∈S
The sum makes sense because α(x, t)β(t, y) is nonzero for only finitely many values of t.
(Clearly this definition is akin to the definition of the product of two square matrices.)
501
Consider the element ι of A defined simply by
(
1 if x ≤ y
ι(x, y) =
0 otherwise.
The function ι, regarded as a matrix over Z, has an inverse matrix, say ν. That means
(
X 1 if x = y,
ι(x, t)ν(t, y) =
t∈S
0 otherwise.
f = ιg (87.6.7)
g = νf (87.6.8)
are equivalent.
Now let’s sketch out how the traditional Möbius inversion is a special case of Rota’s notion.
Let S be the set N∗ , ordered by the relation x|y between elements x and y. In this case, ν
is essentially a function of only one variable:
Proposition 3: With the above notation, ν(x, y) = µ(y/x) for all x, y ∈ N∗ such that x|y.
The proof is fairly straightforward, by induction on the number of elements of the interval
{z ∈ S|x ≤ z ≤ y}.
Now let g be a function from N∗ to some additive group, and write g(x, y) = g(y/x) for all
pairs (x, y) such that x|y. The equivalence of (7) and (8), for g and its transform ιg, is just
Proposition 2.
Example:Let E be a set, and let S be the set of all finite subsets of E, ordered by inclusion.
The ordered set S is left-finite, and for any x, y ∈ S such that x ⊂ y, we have ν(x, y) =
(−1)|y−x| , where |z| denotes the cardinal of the finite set z.
A slightly more sophisticated example comes up in connection with the chromatic polynomial
of a graph or matroid.
502
If f and g are two arithmetic functions, the sum of f and g, denoted f + g, is given by
The set or arithmetic functions, equippied with these two binary operations, forms a commutative ring.
The 0 of the ring is the function f such that f (n) = 0 for any positive integer n. The 1 of
the ring is the function f with f (1) = 1 and f (n) = 0 for any n > 1.
In the number theory, a multiplicative function is an arithmetic function f (n) of the positive
integer n with the property that f (1) = 1 and whenever a and b are coprime, then:
Outside the number theory, the term multiplicative is usually used for all functions with the
property f (ab) = f (a)f (b) for all arguments a and b. This article discusses number theoretic
multiplicative functions.
• φ(n): the Euler totient function, counting the positive integers coprime to n,
• µ(n): the Moebius function, related to the number of prime factors of square-free numbers,
503
• σk (n): the sum of the k-th powers of all the positive divisors of n (where k may be any
complex number),
• Idk (n): the power functions, defined by Idk (n) = nk for any natural number (or even
complex number) k,
1 = 12 + 02 = (−1)2 + 02 = 02 + 12 = 02 + (−1)2
and therefore r2 (1) = 4 6= 1. This shows that the function is not multiplicative. However,
r2 (n)/4 is multiplicative.
Similarly, we have:
504
Convolution If f and g are two arithmetic functions, one defines a new arithmetic function
f ∗ g, the convolution of f and g, by:
X n
(f ∗ g)(n) = f (d)g ,
d
d|n
where the sum extends over all positive divisors d of n. Some general properties of this
operation include (here the argument n is omitted in all functions):
• f ∗ g = g ∗ f,
• (f ∗ g) ∗ h = f ∗ (g ∗ h),
• f ∗ ε = ε ∗ f = f.
This shows that the multiplicative functions with the convolution form a commutative monoid
with the identity element ε. Relations among the multiplicative functions discussed above
include:
• 1 ∗ 1 = d,
• Id ∗ 1 = σ,
• Idk ∗ 1 = σ k ,
• φ ∗ 1 = Id.
505
Examples Some examples of a non-multiplicative functions are the arithmetic functions:
• c4 (n) - the number of ways that n can be expressed as the sum of four squares of
nonnegative integers, where we distinguish between different orders of the summands.
For example:
1 = 12 + 0 2 + 0 2 + 0 2 = 0 2 + 1 2 + 0 2 + 0 2 + 0 2 = 0 2 + 0 2 + 1 2 + 0 2 = 0 2 + 0 2 + 0 2 + 1 2 ,
hence c4 (1) = 4 6= 1 .
P (2 · 5) = P (10) = 42 and
P (2)P (5) = 2 · 7 = 14 6= 42 .
• The prime counting function π(n). Here we first have π(1) = 0 6= 1 and then we have
as yet for example:
π(2 · 5) = π(10) = 4 and
π(2)π(5) = 1 · 3 = 3 6= 4 .
Λ(2)Λ(5) = ln 2 · ln 5 6= 0 .
We would think that for some n multiplicativity of Λ(n) would be true as in:
Λ(2)Λ(6) = ln 2 · 0 = 0 ,
but we have to write:
Λ(22 )Λ(3) = ln 2 · ln 3 6= 0 .
506
87.10 totient
g∗f =h
for some two completely multiplicative sequences g and h, where ∗ denotes the convolution
product (or Dirichlet product; see multiplicative function).
The term ‘totient’ was introduced by Sylvester in the 1880’s, but is seldom used nowadays
except in two cases. The Euler totient φ satisfies
ι0 ∗ φ = ι1
where ιk denotes the function n 7→ nk (which is completely multiplicative). The more general
Jordan totient Jk is defined by
ι0 ∗ Jk = ιk .
87.11 unit
Let R be a ring with multiplicative identity 1. We say that u ∈ R is an unit (or unital) if u
divides 1 (denoted u | 1). That is, there exists an r ∈ R such that 1 = ur = ur.
507
Chapter 88
11A41 – Primes
There are two different functions who are collectively known as the Chebyshev functions:
X
ϑ(x) = log p
p6x
where the notation used indicates the summation over all positive primes p less than or equal
to x, and
X
ψ(x) = k log p,
p6x
where the same summation notation is used and k denotes the unique integer such that
pk 6 x but pk+1 > x. Heuristically, these first of these two functions measures the number
of primes less than x and the second does the same, but weighting each prime in accordance
with their logarithmic relationship to x.
Many innocuous results in number theory owe their proof to a relatively simple analysis of
the asymptotics of one or both of these functions. For example, the fact that for any n, we
have
Y
p < 4n
p6n
A somewhat less innocuous result is that the prime number theorem (i.e. that π(x) ∼ logx x )
is equivalent to the statement that ϑ(x) ∼ x, which in turn, is equivalent to the statement
that ψ(x) ∼ x.
508
REFERENCES
1. Ireland, Kenneth and Rosen, Michael. A Classical Introduction to Modern Number Theory.
Springer, 1998.
2. Nathanson, Melvyn B. Elementary Methods in Number Theory. Springer, 2000.
If there was only a finite amount of primes then there would be some largest prime p.
However p! + 1 is not divisible by any number n 6 p greater than one, so p! + 1 cannot be
factored by the primes we already know, but every integer greater than one is divisible by
at least one prime, so there must be some prime greater than p by which p! + 1 is divisible.
A number theoretic function used in the study of prime numbers; specifically it was used in
the proof of the prime number theorem.
It is defined thus:
X
ψ(x) = Λ(r)
r6x
Note that we do not have to worry that the inequality above is ambiguous, because Λ(x) is
only non-zero for natural x. So no matter whether we take it to mean r is real, integer or
natural, the result is the same because we just get a lot of zeros added to our answer.
x
π(x) ∼
ln(x)
where π(x) is the prime counting function, is equivalent to the statement that:
509
ψ(x) ∼ x
We can also define a ”smoothing function” for the summatory function, defined as:
1
ψ1 (x) ∼ x2
2
which turns out to be easier to work with than the original form.
The currently known Mersenne primes are n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107,
127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209,
44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269, 2976221, 3021377,
6972593, and 13,466,917.
It is conjectured that the density of Mersenne primes with exponent p < x is of order
eγ
log log x
log 2
where γ is Euler’s constant.
Let p be a prime number of the form 4k + 1 . Then there are two unique integers a and b
with 0 < a < b such that p = a2 + b2 . Additionally, if a number p can be written in as the
510
sum of two squares in 2 different ways, then the number p is composite.
A composite number is a natural number which is not prime and not equal to 1. That is,
n is composite if n = ab, with a and b natural numbers both not equal to 1.
Examples.
• 15 is composite, since 15 = 3 · 5.
88.7 prime
An integer p is prime if it has exactly two positive divisors. The first few positive prime
numbers are 2, 3, 5, 7, 11, . . . .
The prime counting function is a non-multiplicative function for any positive real number
x, denoted as π(x) and gives the number of primes not exceeding x. It usually takes a
positive integer n for an argument. The first few values of π(n) for n = 1, 2, 3, . . . are
0, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8 . . . (Sloane’s sequence A000720 ).
511
The asymptotic behavior of π(x) ∼ x/ ln x is given by the prime number theorem. This
function is closely related with the Chebyshev’s functions θ(x) and ψ(x).
The prime difference function is an arithmetic function for any positive integer n, denoted
as dn and gives the difference between two consecutive primes pn and pn+1 :
dn ⇔ pn+1 − pn .
For example:
• d1 = p2 − p1 = 3 − 2 = 1,
Define π(x) as the number of primes less than or equal to x. The prime number theorem
asserts that
x
π(x) v
log x
as x → ∞, that is, π(x)/ logx x tends to 1 as x increases. Here log x is the natural logarithm.
There is a sharper statement that is also known as the prime number theorem:
π(x) = li x + R(x),
512
where li is the logarithmic integral defined as
dt x 1!x (k − 1)!x x
li x = intx2 = + 2
+···+ +O ,
log t log x (log x) (log x)k (log x)k+1
and R(x) is the error term whose behavior is still not fully known. From the work of Korobov
and Vinogradov on zeroes of Riemann zeta-function it is known that
R(x) = O x exp(−c(θ)(log x)θ )
for every θ > 53 . The unproven Riemann hypothesis is equivalent to the statement that
R(x) = O(x1/2 log x).
There exist a number of proofs of the prime number theorem. The original proofs by
Hadamard [4] and de la Vallée Poussin[7] called on analysis of behavior of the Riemann zeta function
ζ(s) near line Re s = 1 to deduce the estimates for R(x). For a long time it was an open
problem to find an elementary proof of the prime number theorem (“elementary” meaning
“not involving complex analysis”). Finally Erdös and Selberg [3, 6] found such a proof.
Nowadays there are some very short proofs of the prime number theorem (for example, see
[5]).
REFERENCES
1. Tom M. Apostol. Introduction to Analytic Number Theory. Narosa Publishing House, second
edition, 1990. Zbl 0335.10001.
2. Harold Davenport. Multiplicative Number Theory. Markham Pub. Co., 1967. Zbl 0159.06303.
3. Paul Erdös. On a new method in elementary number theory. Proc. Nat. Acad. Sci. U.S.A.,
35:374–384, 1949. Zbl 0034.31403.
4. Jacques Hadamard. Sur la distribution des zéros de la fonction ζ(s) et ses conséquences
arithmétiques. Bull. Soc. Math. France, 24:199–220. JFM 27.0154.01.
5. Donald J. Newman. Simple analytic proof of the prime number theorem. Amer. Math. Monthly,
87:693–696, 1980. Available online at JSTOR.
6. Atle Selberg. An elementary proof of the prime number theorem. Ann. Math. (2), 50:305–311,
1949. Zbl 0036.30604.
7. Charles de la Vallée Poussin. Recherces analytiques sur la théorie des nombres premiers. Ann.
Soc. Sci. Bruxells, 1897.
513
1
Li(x) = intn2 dx
ln x
.
Let p be a prime congruent to 1 mod 4. By Euler’s criterion (or by Gauss’ lemma), the
congruence
x2 ⇔ −1 (mod p) (88.12.1)
has a solution. By Dirichlet’s approximation theorem, there exist integers a and b such that
x
a − b ≤ √ 1 1
p [ p] + 1 < √p (88.12.2)
√ √
1 ≤ a ≤ [ p] < p
(2) tells us
√
|ax − bp| < p.
Write u = |ax − bp|. We get
u2 + a2 ⇔ a2 x2 + a2 ⇔ 0 (mod p)
and
0 < u2 + a2 < 2p ,
whence u2 + a2 = p, as desired.
To prove Thue’s lemma in another way, we will imitate a part of the proof of Lagrange’s four-square theorem
From (1), we know that the equation
x2 + y 2 = mp (88.12.3)
has a solution (x, y, m) with, we may assume, 1 ≤ m < p. It is enough to show that if m > 1,
then there exists (u, v, n) such that 1 ≤ n < m and
u2 + v 2 = np .
If m is even, then x and y are both even or both odd; therefore, in the identity
2 2
x+y x−y x2 + y 2
+ =
2 2 2
both summands are integers, and we can just take n = m/2 and conclude.
514
If m is odd, write a ⇔ x (mod m) and b ⇔ y (mod m) with |a| < m/2 and |b| < m/2. We
get
a2 + b2 = nm
for some n < m. But consider the identity
ax + by ⇔ x2 + y 2 ⇔ 0 (mod m)
ay − bx ⇔ xy − yx ⇔ 0 (mod m) .
through by m2 , getting an expression for np as a sum of two squares. The proof is complete.
88.13 semiprime
A composite number which is the product of two (possibly equal) primes is called semiprime.
Such numbers are sometimes also called 2-almost primes. For example:
• 4 is a semiprime, since 4 = 2 · 2,
515
The first few semiprimes are 4, 6, 9, 10, 14, 15, 21, 22, 25, 26, 33, 34, 35, 38, 39, 46, 49, 51, 55, 57, 58, 62, . . .
(Sloane’s sequence A001358 ). The Moebius function µ(n) for semiprimes can be only equal
to 0 or 1. If we form an integer sequence of values of µ(n) for semiprimes we get a binary
sequence: 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, . . .. (Sloane’s sequence A072165
).
All the squares of primes are also semiprimes. The first few squares of primes are then
4, 9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849, 2209, 2809, 3481, 3721, 4489, 5041, . . ..
(Sloane’s sequence A001248 ). The Moebius function µ(n) for the squares of primes is always
equal to 0 as it is equal to 0 for all the squares of semiprimes.
The sieve of Eratosthenes is a simple algorithm for generating prime numbers between 1 and
some arbitrary integer N.
Let p = 2, which is of course known to be prime. Mark all positive multiples of p (2, 4, 6, 8 . . . )
as composite. Now let p be the smallest number not marked as composite (in this case, 3);
it must be the next prime.
√ Again, mark all positive multiples of p as composite. Continue
this process while p 6 N. When done, all numbers less than N that have not been marked
as composite are prime.
For many years, the sieve of Eratosthenes was the fastest known algorithm for generating
primes. Today, there are faster methods, such as a quadratic sieve.
REFERENCES
1. Donald E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, 1969.
516
Chapter 89
Fermat Numbers.
The n-th Fermat number is defined as:
n
Fn = 22 + 1.
Fermat incorrectly conjectured that all these numbers were primes, although he had no
proof. The first 5 Fermat numbers: 3, 5, 17, 257, 65537 (corresponding to n = 0, 1, 2, 3, 4) are
all primes (so called Fermat primes) Euler was the first to point out the falsity of Fermat’s
conjecture by proving that 641 is a divisor of F5 . (In fact, F5 = 641 × 6700417). Moreover,
no other Fermat number is known to be prime for n > 4, so now it is conjectured that
those are all prime Fermat numbers. It is also unknown whether there are infinitely many
composite Fermat numbers or not.
One of the famous achievements of Gauss was to prove that the regular polygon of m sides
can be constructed with ruler and compass if and only if m can be written as
where k ≥ 0 and the other factors are distinct primes of the form Fn .
The Fermat compositeness test states that for any odd integer n > 0, if there exist an
integer b, between 1 and n − 1 such that bn−1 6⇔ 1 (mod n) then n is composite.
517
If bn−1 6⇔ 1 (mod n) then b is a witness to n’s compositeness.
Fermat compositness test is a fast way to prove compositness of most numbers, but unfor-
tunately there are composite numbers that are pseudoprime in every base. An example of
a such number is 561. These numbers are called Carmichael numbers (see EIS sequence
A002997 for a list of first few Carmichael numbers).
[ Proof of the Fermat compositeness test] Suppose n is prime. Then the Euler phi-function
of n is given by φ(n) = n − 1 and by Fermat’s little theorem bn−1 ⇔ 1 (mod n) for all
integers b. We can conclude that if this is not the case, then n is not prime, so n must be
composite.
For all positive integers q > 1 and n > 1, there exists a prime p which divides q n − 1 but
doesn’t divide q m − 1 for 0 < m < n, except when q = 2k − 1 and n = 2 or q = 2 and n = 6.
89.4 divisibility
Given integers a and b, then we say a divides b if and only if there is some q ∈ Z such that
b = qa.
• b is divisible by a
• a is a factor of b
• a is a divisor of b
518
89.5 division algorithm for integers
Given any two integers a, b where b > 0, there exists a unique pair of integers q, r such that
a = qb + r and 0 6 r < b. q is called the quotient of a and b, and r is the remainder.
The division algorithm is not an algorithm at all but rather a theorem. Its name probably
derives from the fact that it was first proved by showing that an algorithm to calculate the
quotient of two integers yields this result.
There are similar forms of the division algorithm that apply to other rings (for example,
polynomials).
Let a, b integers (b > 0). We want to express a = bq + r for some integers q, r with 0 6 r < b
and that such expression is unique.
From all these numbers, there has to be a smallest non negative one. Let it be r. Since
r = a − qb for some q,1 we have a = bq + r. And, if r > b then r wasn’t the smallest non-
negative number on the list, since the previous (equal to r − b) would also be non-negative.
Thus 0 6 r < b.
So far, we have proved that we can express a asbq + r for some pair of integers q, r such that
0 6 r < b. Now we will prove the uniqueness of such expression.
519
89.7 square-free number
A square-free number is a natural number that contains no powers greater than 1 in its
prime factorization. In other words, if x is our number, and
r
Y
x= pai i
i=1
is the prime factorization of x into r distinct primes, then ai ≥ 2 is always false for square-free
x.
The name derives from the fact that if any ai were to be greater than or equal to two, we
could be sure that at least one square divides the x (namely, p2i .)
The asymptotic density of square-free numbers is π62 which can be proved by application of
a square-free variation of the sieve of Eratosthenes as follows:
X
A(n) = [n is a squarefree]
k6n
XX
= µ(d)
k6n d2 |k
X X
= µ(d) 1
√
d6 n k6n
d2 |n
X jnk
= µ(d)
√ d2
d6 n
X µ(d) √
=n + O( n)
√ d2
d6 n
X µ(d) √
=n + O( n)
d
d2
1 √ 6 √
=n + O( n) = n 2 + O( n).
ζ(2) π
It was shown that Riemann hypothesis implies error term O(n7/22+ ) in the above.
A natural number n is called squarefull (or powerful) if for every prime p|n we have p2 |n. In
1978 Erdös conjectured that we cannot have three consecutive squarefull natural numbers.
520
If we assume the ABC conjecture, there are only finitely many such consecutive triples.
I f p > n then p doesn’t divide n!, and its power is 0, and the sum above is empty. So let
the prime p ≤ n. j k j k
For each 1 ≤ i ≤ K, there are pni − pi+1 n
numbers between 1 and n with i being the
greatest power of p dividing each. So the power of p dividing n! is
K
X
n n
i· − i+1 .
i=1
pi p
j k
n
But each pi
, i ≥ 2 in the sum appears with factors i and i − 1, so the above sum equals
X
n K
i=1 .
pi
Corollary 1.
K
X
n n − δp (n)
K
= ,
p p−1
k=1
j k
n n−δp (n)
I f n < p, then δp (n) = n and p
is 0 = p−1
. So we assume p 6 n.
521
Let nK nK−1 · · · n0 be the p-adic representation of n. Then
n − δp (n) PK P
k−1 j
= k=1 nk j=0 p
p−1
P
= 0≤j<k≤K pj .nk
= n1 + n2 p + . . . + nK pK−1
+ n2 + n3 p + . . . + nK pK−2
..
.
+nK
K
X
n
= .
pk
k=1
522
Chapter 90
0 1 1
, , .
1 1 0
Repeating the process, we get
0 1 1 2 1
, , , , ,
1 2 1 1 0
and then
0 1 1 3 1 3 2 3 1
, , , , , , , , ,
1 3 2 2 1 2 1 1 0
and so forth. It can be proven that every irreducible fraction appears at some iteration
[1]. The process can be represented graphically by means of so-called Stern-Brocot tree,
named after its discoverers, Moris Stern and Achille Brocot.
523
0 1 1
1 1 0
0 1 1 2 1
1 2 1 1 0
0 1 1 2 1 3 2 3 1
1 3 2 3 1 2 1 1 0
0 1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1
1 4 3 5 2 5 3 4 1 3 2 3 1 2 1 1 0
If we specify position of a fraction in the tree as a path consisting of L(eft) an R(ight) moves
along the tree starting from the top (fraction 11 ), and also define matrices
1 1 1 0
L= , R= ,
0 1 1 1
n n0
then product of the matrices corresponding to the path is matrix m m0 whose entries are
numerators and denominators of parent fractions. For example, the path leading to fraction
3
5
is LRL. The corresponding matrix product is
1 1 1 0 1 1 2 3
LRL = = ,
0 1 1 1 0 1 1 2
3 1
and the parents of 5
are 2
and 23 .
REFERENCES
1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. Addison-
Wesley, 1998. Zbl 0836.00001.
Let (an )n≥1 be a sequence of positive real numbers, and let a0 be any real number. Consider
the sequence
1
c1 = a0 +
a1
524
1
c2 = a0 + 1
a1 + a2
1
c3 = a0 + 1
a1 + a2 + a1
3
c4 = . . .
The limit c of this sequence, if it exists, is called the value or limit of the infinite continued
fraction with convergents (cn ), and is denoted by
1
a0 + 1
a1 + a2 + a 1
3 +...
or by
1 1 1
a0 + ...
a1 + a2 + a3 +
If the denominators an are all (positive) integers, we speak of a simple continued fraction.
We then use the notation q = ha0 ; a1 , a2 , a3 , . . .i or, in the finite case, q = ha0 ; a1 , a2 , a3 , . . . , an i .
It is not hard to prove that any irrational number c is the value of a unique infinite simple
continued fraction. Moreover, if cn denotes its nth convergent, then c − cn is an alternating
sequence and |c − cn | is decreasing (as well as convergent to zero). Also, the value of an
infinite simple continued fraction is perforce irrational.
Any rational number is the value of two and only two finite continued fractions; in one of
them, the last denominator is 1. E.g.
43
= h1; 2, 3, 4i = h1; 2, 3, 3, 1i .
30
c = ha0 ; a1 , a2 , . . .i
525
and, for some integer m and some integer k > 0, we have an = an+k for all n ≥ m.
For example, consider the quadratic equation for the golden ratio:
x2 = x + 1
or equivalenty
1
x=1+ .
x
We get
1
x = 1+
1 + x1
1
= 1+
1 + 1+1 1
x
Owing to a kinship with the Euclidean division algorithm, continued fractions arise naturally
in number theory. An interesting example is the Pell Diophantine equation
x2 − Dy 2 = 1
where D is a nonsquare integer > 0. It turns out that if √ (x, y) is any solution of the Pell
equation other than (±1, 0), then |x/y| is a convergent to D.
22 355
7
and 113
are well-known rational approximations to π, and indeed both are convergents to
π:
3.14159265 . . . = π = h3; 7, 15, 1, 292, ...i
22
14285714 . . . = = h3.3; 7i
7
355
14159292 . . . = = h3; 7, 15, 1i = h3; 7, 16i
113
For one more example, the distribution of leap years in the 4800-month cycle of the Gregorian
calendar can be interpreted (loosely speaking) in terms of the continued fraction expansion
of the number of days in a solar year.
526
Chapter 91
Proof:
For the proof we can allow base p representations of numbers with leading zeros. So let
nd nd−1 · · · n0 := n,
md md−1 · · · m0 := m,
all in base p. We set r = n − m and denote the p-adic representation of r with rd rd−1 · · · r0 .
Finally, we introduce–as in the corollary in the entry on the prime power dividing a given factorial–
δp (n) as the sum of digits in the p-adic representation of n. Then it follows that the power
n
of p dividing m is
δp (m) + δp (r) − δp (n)
.
p−1
For each j ≥ 0, we have
nj = mj + rj + cj−1 − p.cj .
527
Then
Pd
δp (m) + δp (r) − δp (n) = k=0 (mk + rk − nk )
X d
Pd
= k=0 ((p − 1)cj ) + (cj − cj−1) = cd − c−1 = 0 .
|k=0 {z }
Hence we have
d
δp (m) + δp (r) − δp (n) X
= ck ,
p−1 k=0
the total number of carries.
Given integers n ≥ m ≥ 0 and a prime number p, let ni , mi , ri be the i-th digit of n, m, and
r := n − m, respectively.
The last sum in the above equation leaves only the values for indices i and d, and we get
n m r
i
= i + i + ci−1 (91.2.1)
p p p
for all i ≥ 0.
528
Chapter 92
Erdös and Sierpinski conjectured that for any integer n > 3 there exist positive integers
a, b, c so that:
5 1 1 1
= + +
n a b c
Two fractions ab and dc , ab > dc of the positive integers a, b, c, d are adjacent if their difference
is some unit fraction n1 , n > 0 that is, if we can write:
a c 1
− = .
b d n
1 1
For example the two proper fractions and unit fractions 11
and 12
are adjacent since:
1 1 1
− = .
11 12 132
1 1
17
and 19
are not since:
1 1 2
− = .
17 19 323
529
It is not necessary of course that fractions are both proper fractions:
20 19 1
− = .
19 19 19
or unit fractions:
3 2 1
− = .
4 3 12
All successive terms of some Farey sequence Fn of a degree n are always adjacent fractions.
In the first Farey sequence F1 of a degree 1 there are only two adjacent fractions, namely 11
and 01 .
1 1 141
+ = .
70 71 4970
Representation
1. Let
b
n=
a
1
be the smallest natural number for which n
≤ ab . If a = 0, terminate.
1
2. Output n
as the next term of the sum.
530
Proof of correctness
The algorithm can never output the same unit fraction twice. Indeed, any n selected in step
1
1 is at least 2, so n−1 < n2 – so the same n cannot be selected twice by the algorithm, as
then n − 1 could have been selected instead of n.
b ≤ an < b + a.
So
a 1 an − b
− = ,
b n bn
a
and 0 ≤ an − b < a – by the induction hypothesis, the algorithm terminates for b
− n1 .
QED
Problems
1. The greedy algorithm always works, but it tends to produce unnecessarily large denominators.
For instance,
47 1 1 1
= + + ,
60 3 4 5
1
but the greedy algorithm selects 2 , leading to the representation
47 1 1 1
= + + .
60 2 4 30
2. The representation is never unique. For instance, for any n we have the representations
1 1 1
= +
n n + 1 n · (n + 1)
So given any one representation of ab as a sum of different unit fractions we can take
the largest denominator appearing n and replace it with two (larger) denominators.
Continuing the process indefinitely, we see infinitely many such representations, always.
531
92.4 conjecture on fractions with odd denominators
Egyptian fractions raise many open problems; this is one of the most famous of them.
Suppose we wish to write fractions as sums of distinct unit fractions with odd denominators.
Obviously, every such sum will have a reduced representation with an odd denominator.
2 1 1 2
For instance, the greedy algorithm applied to 7
gives 4
+ 28
, but we may also write 7
as
1
7
+ 91 + 35
1 1
+ 315 .
It is known that we can we represent every rational number with odd denominator as a sum
of distinct unit fractions with odd denominators.
However it is not known whether the greedy algorithm works when limited to odd denomi-
nators.
a
Conjecture 1. For any fraction 0 ≤ 2k+1 < 1 with odd denominator, if we repeatedly
subtract the largest unit fraction with odd denominator that is smaller than our fraction, we
will eventually reach 0.
Such fractions are known from Egyptian mathematics where we can find a lot of special
representations of the numbers as a sum of an unit fractions, which are now called Egyptian
fractions. From the Rhind papyrus as an example:
2 1 1 1
= + + .
71 40 568 710
Many unit fractions are in the pairs of the adjacent fractions. An unit fractions are some
successive or non-successive terms of any Farey sequence Fn of a degree n. For example the
fractions 12 and 14 are adjacent, but they are not the successive terms in the Farey sequence
F5 . The fractions 13 and 14 are also adjacent and they are successive terms in the F5 .
532
Chapter 93
11A99 – Miscellaneous
where rad is the radical of an integer. This conjecture was formulated by Masser and Oesterlé
in 1980.
The ABC conjecture is considered one of the most important unsolved problems in number
theory, as many results would follow directly from this conjecture. For example, Fermat’s last theorem
could be proved (for sufficiently large exponents) with perhaps one page worth of proof.
Further Reading
An interesting and elementary article on the ABC conjecture can be found at http://www.maa.org/mathland
k = ±12 ± 22 ± · · · ± m2
533
for some m ∈ Z+ .
We firstly note that:
0 = 12 + 2 2 − 3 2 + 4 2 − 5 2 − 6 2 + 7 2
1 = 12
2 = −12 − 22 − 32 + 42
4 = −12 − 22 + 32
Now it suffices to prove that if the theorem is true for k then it is also true for k + 4.
As
(m + 1)2 − (m + 2)2 − (m + 3)2 + (m + 4)2 = 4
it’s simple to finish the proof:
if k = ±12 ± · · · ± m2 then
That is, the nth triangular number is simply the sum of the first n natural numbers. The
first few triangular numbers are
534
1, 3, 6, 10, 15, 21, 28, . . .
The name triangular number comes from the fact that the summation defining tn can be
visualized as the number of dots in
•
• •
• • •
• • • •
• • • • •
• • • • • •
.. .. . .
. . .
n(n + 1)
t(n) =
2
Legend has it that a grammar-school-aged Gauss was told by his teacher to sum up all the
numbers from 1 to 100. He reasoned that each number i could be paired up with 101 − i, to
form a sum of 101, and if this was done 100 times, it would result in twice the actual sum
(since each number would get used twice due to the pairing). Hence, the sum would be
100(101)
1 + 2 + 3 + · · · + 100 =
2
The same line of reasoning works to give us the closed form for any n.
Another way to derive the closed form is to assume that the nth triangular number is less
than or equal to the nth square (that is, each row is less than or equal to n, so the sum of all
rows must be less than or equal to n · n or n2 ), and then use the first few triangular numbers
to solve the general 2nd degree polynomial An2 + Bn + C for A, B, and C. This leads to
A = 1/2, B = 1/2, and C = 0, which is the same as the above formula for t(n).
535
Chapter 94
REFERENCES
1. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996.
This statement was known also as (α + β)-conjecture until H. B. Mann proved it in 1942.
536
94.3 Schnirelmann density
Let A be a subset of Z, and let A(n) be number of elements of A in [1, n]. Schnirelmann
density of A is
A(n)
σA = inf n .
n
2. σA = 1 if and only if N ⊆ A
T
Schnirelman proved that if 0 ∈ A B then
σ(A + B) > σA + σB − σA · σB
and also if σA + σB > 1, then σ(A + B) = 1. From these he deduced that if σA > 0 then
A is an additive basis.
A set of natural numbers is called a Sidon set if all pairwise sums of its elements are
distinct. Equivalently, the equation a + b = c + d has only the trivial solution {a, b} = {c, d}
in elements of the set.
Sidon sets are a special case of so-called Bh [g] sets. A set A is called a Bh [g] set if for every
n ∈ N the equation n = a1 + . . . + ah has at most g different solutions with a1 6 · · · 6 ah
being elements of A. The Sidon sets are B2 [1] sets.
Define Fh (n, g) as the size of the largest Bh [g] set contained in the interval [1, n]. Whereas
it is known that F2 (n, 1) = n1/2 + O(n5/16 )[2, p. 85], no asymptotical results are known for
g > 1 or h > 2 [1].
The infinite Bh [g] sets are understood even worse. Erdös T [2, p. 89] proved that for every
−1/2
infinite Sidon set A we have lim inf n→∞ (n/ log n) |A [1, n]| < C for
T some constant C.
On the other hand, for a long time no example of a set for which |A [1, n]| > n1/3+ for
some > 0 was known. Only recently Ruzsa[3] used an√ extremely clever construction to
T
prove the existence of a set A for which |A [1, n]| > n 2−1− for every > 0 and for all
sufficiently large n.
537
REFERENCES
1. Ben Green. Bh [g] sets: The current state of affairs.
http://www.dpmms.cam.ac.uk/~bjg23/papers/bhgbounds.dvi, 2000.
2. Heini Halberstam and Klaus Friedrich Roth. Sequences. Springer-Verlag, second edition, 1983.
Zbl 0498.10001.
3. Imre Ruzsa. An infinite Sidon sequence. J. Number Theory, 68(1):63–71, 1998. Available at
http://www.math-inst.hu/~ruzsa/cikkek.html.
Definition Let X be a set. Then the discrete topology for X is the topology given by the
power set of X. A topological space equipped with the discrete topology is called a discrete
space.
In other words, the discrete topology is the finest topology one can give to a set.
1. X is a discrete space.
538
3. If A is a subset of X, and x ∈ A, then A is a neighborhood of x.
Definition Suppose X is a topological space and Y is a subset equipped with the subspace topology.
If Y is a discrete space, then Y is called discrete subspace of X.
Erdös proved that every basis is an essential component. In fact he proved that
1
σ(A + B) > σB + (1 − σB)σB,
2h
where h denotes the order of A.
There are non-basic essential components. Linnik constructed non-basic essential component
for which A(n) = O(n ) for every > 0.
REFERENCES
1. Heini Halberstam and Klaus Friedrich Roth. Sequences. Springer-Verlag, second edition, 1983.
Let f (n) and F (n) be functions from Z+ → R. We say that f (n) has normal order F (n) if
for each > 0 the set
A() = {n ∈ Z+ : (1 − )F (n) < f (n) < (1 + )F (n)}
539
has the property that d(A()) = 1. Equivalently, if B() = Z+ \A(), then d(B()) = 0.
(Note that d(X) denotes the lower asymptotic density of X).
540
Chapter 95
Erdös-Turan conjecture asserts there exist no asymptotic basis A ⊂ N0 of order 2 such that
its representation function X
0
rA,2 (n) = 1
a1 +a2 =n
a1 6a2
is bounded.
Alternatively, the question can be phrased as whether there exists a power series F with
coefficients 0 and 1 such that all coefficients of F 2 are greater than 0, but are bounded.
If we replace set of nonnegative integers by the set of all integers, then the question was
0
settled by Nathanson[2] in negative, that is, there exists a set A ⊂ Z such that rA,2 (n) = 1.
REFERENCES
1. Heini Halberstam and Klaus Friedrich Roth. Sequences. Springer-Verlag, second edition, 1983.
Zbl 0498.10001.
2. Melvyn B. Nathanson. Every function is the representation function of an additive basis for
the integers. arXiv:math.NT/0302091.
541
95.2 additive basis
where nA is n-fold sumset of A. Usually it is assumed that 0 belongs to A when saying that
A is an additive basis.
The bases entry gives a good overview over the symbolic representation of numbers, some-
thing which we use every day. This entry will give a simple overview and method of converting
numbers between bases: that is, taking one representation of a number and converting it to
another.
Perhaps the simplest way to explain base conversion is to describe the conversion to and
from base 10 (in which everyone is accustomed to performing arithmetic). We will begin
with the easier method, which is the conversion from some other base to base 10.
Conversion to Base 10
This is straight-forward enough. All we need to do is convert each symbol s to its decimal
equivalent. Typically this is simple. The symbols 0, 1, · · · , 9 usually represent the same value
in any other base. For b > 9, the letters of the alphabet begin to be used, with a = 10, b = 11,
542
and so on. This serves up to b = 36. Since the most common bases used are binary (b = 2),
octal (b = 8), decimal (b = 10), and hexadecimal (b = 16), this scheme is generally sufficient.
Once we map symbols to values, we can easily apply the formula above, adding and multi-
plying as we go along.
Note that there is no exact decimal representation of the ternary value 10210.21.
This happens often with conversions between bases. This is why many decimal values
(such as 0.1) cannot be represented precisely in binary floating point (generally used
in computers for non-integral arithmetic).
Iterative Method
Obviously base conversion can become very tedious. It would make sense to write a com-
puter program to perform these operations. What follows is a simple algorithm that iterates
543
through the symbols of a representation, accumulating a value as it goes along. Once all the
symbols are iterated, the result is in the accumulator. This method only works for whole
numbers. The fractional part of a number could be converted in the same manner, but it
would have to be a separate process, working from the end of the number rather than the
beginning.
Algorithm Parse((A, n, b))
Input: Sequence A, where 0 6 A[i] < b for 1 6 i 6 n, b > 1
Output: The value represented by A in base b (where A[1] is the left-most digit) v ← 0
for i ← 1 to n do
v ← b ∗ v + A[i]
Parse← v
This is a simple enough method that it could even be done in one’s head.
This is a bit easier than remembering that the first digit corresponds to the 25 = 32 place.
To convert a value to some base b simply consists of inverse application of the iterative
method. As the iterative method for “parsing” a symbolic representation of a value consists
of multiplication and addition, the iterative method for forming a symbolic representation
of a value will consist of division and subtraction.
544
Example of Conversion from Base 10
Note how we obtain the symbols from the right. We get the next symbol (moving left) by
taking the value of n mod 16. We then replace n with the whole part of n/16, and repeat
until n = 0.
This isn’t as easy to do in one’s head as the other direction, though for small bases (e.g.
b = 2) it is feasible. For example, to convert 20000 to binary:
n Representation
20000
10000 0
5000 00
2500 000
1250 0000
625 0 0000
312 10 0000
156 010 0000
78 0010 0000
39 0 0010 0000
19 10 0010 0000
9 110 0010 0000
4 1110 0010 0000
2 0 1110 0010 0000
1 00 1110 0010 0000
0 100 1110 0010 0000
The digits in the previous example are grouped into sets of four both to ease readability, and
to highlight the relationship between binary and hexadecimal. Since 24 = 16, each group of
four binary digits is the representation of a hexadecimal digit in the same position of the
sequence.
545
It is trivial to get the octal and hexadecimal representations of any number once one has the
binary representation. For instance, since 23 = 8, the octal representation can be obtained
by grouping the binary digits into groups of 3, and converting each group to an octal digit.
95.5 sumset
A1 + A2 + · · · + An
546
Chapter 96
At first sight it may seem that the greedy algorithm yields the densest subset of {0, 1, . . . , N}
that is free of arithmetic progressions of length 3. It is not hard to show that the greedy
algorithm yields the set of numbers that lack digit 2 in their ternary development. Density
of such numbers is O(N log3 2−1 ).
However, in 1946 Behrend[1] constructed much denser subsets that are free of arithmetic
progressions. His major idea is that if we were looking for a progression-free
T sets in Rn , then
we could use spheres. So, consider an d-dimensional cube [1, n]d Zd and family of spheres
x21 + x22 + · · · + x2d = t for t = 1, . . . , dn2 . Each point in the cube is contained in one of the
spheres, and so at least one of the spheres contains nd /dn2 lattice points. Let us call this set
A. Since sphere does not contain arithmetic progressions, A does not contain any progressions
either. Now let f be a Freiman isomorphism from A to a subset of Z defined as follows. If
x = {x1 , x2 , . . . , xd } is a point of A, then f (x) = x1 + x2 (2n) + x3 (2n)2 + · · · + xd (2n)d−1 ,
that is, we treat xi as i’th digit of f (x) in base 2n. It is not hard to see that f is indeed
a Freiman
√ isomorphism of order 2, and that f (A) ⊂ {1, 2, . . . , N = (2n)d }. If we set
d = c ln √ N, then we get that there is a progression-free subset of {1, p 2, . . . , N} of size at
− ln N (c ln 2+2/c+o(1)
least Ne . To maximize this value we can set c = 2/ ln 2. Thus, there
exists a progression-free set of size at least
√
Ne− 8 ln 2 ln N (1+o(1))
.
This result was later generalized to sets not containing arithmetic progressions of length k
by Rankin[3]. His construction is more complicated, and depends on the estimates of the
number of representations of an integer as a sum of many squares. He proves that the size
of a set free of k-term arithmetic progression is at least
1/(k−1)
Ne−c(log N ) .
547
On the other hand, Moser[2] gave a construction analogous to that of Behrend, but which
was explicit since it did not use the pigeonhole principle.
REFERENCES
1. Felix A. Behrend. On the sets of integers which contain no three in arithmetic progression.
Proc. Nat. Acad. Sci., 23:331–332, 1946. Zbl 0060.10302.
2. Leo Moser. On non-averaging sets of integers. Canadian J. Math., 5:245–252, 1953.
Zbl 0050.04001.
3. Robert A. Rankin. Sets of integers containing not more than a given number of terms in
arithmetical progression. Proc. Roy. Soc. Edinburgh Sect. A, 65:332–344, 1962. Zbl 0104.03705.
Let A be a finite set of integers such that the 2-fold sumset 2A is “small”, i.e., |2A| < c|A|
for some constant c. There exists an n-dimensional arithmetic progression of length c0 |A|
that contains A, and such that c0 and n are functions of c only.
REFERENCES
1. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996.
Let k be a positive integer and let δ > 0. There exists a positive integer N = N(k, δ) such
that every subset of {1, 2, . . . , N} of size δN contains an arithmetic progression of length k.
The case k = 3 was first proved by Roth[4]. His method did not seem to extend to the case
k > 3. Using completely different ideas Szemerédi proved the case k = 4 [5], and the general
case of an arbitrary k [6].
548
where the lower bound is due to Behrend[1] (for k = 3) and Rankin[3], and the upper bound
is due to Gowers[2].
REFERENCES
1. Felix A. Behrend. On the sets of integers which contain no three in arithmetic progression.
Proc. Nat. Acad. Sci., 23:331–332, 1946. Zbl 0060.10302.
2. Timothy Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal., 11(3):465–588,
2001. Preprint available at http://www.dpmms.cam.ac.uk/~wtg10/papers.html.
3. Robert A. Rankin. Sets of integers containing not more than a given number of terms in
arithmetical progression. Proc. Roy. Soc. Edinburgh Sect. A, 65:332–344, 1962. Zbl 0104.03705.
4. Klaus Friedrich Roth. On certain sets of integers. J. London Math. Soc., 28:245–252, 1953.
Zbl 0050.04002.
5. Endre Szemerédi. On sets of integers containing no four elements in arithmetic progression.
Acta Math. Acad. Sci. Hung., 20:89–104, 1969. Zbl 0175.04301.
6. Endre Szemerédi. On sets of integers containing no k elements in arithmetic progression. Acta.
Arith., 27:299–345, 1975. Zbl 0303.10056.
Q = Q(a; q1 , . . . , qn ; l1 , . . . , ln )
= { a + x1 q1 + · · · + xn qn | 0 6 xi < li for i = 1, . . . , n }.
REFERENCES
1. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996.
549
Chapter 97
Let A be a set of natural numbers. Let Rn (A) be the number of ways to represent n as a
sum of two elements in A, that is,
X
Rn (A) = 1.
ai +aj =n
ai ,aj ∈A
cannot hold.
REFERENCES
1. Paul Erdös and Wolfgang H.J. Fuchs. On a problem of additive number theory. J. Lond. Math.
Soc., 31:67–73, 1956. Zbl 0070.04104.
2. Heini Halberstam and Klaus Friedrich Roth. Sequences. Springer-Verlag, second edition, 1983.
Zbl 0498.10001.
3. Imre Ruzsa. A converse to a theorem of Erdös and Fuchs. J. Number Theory, 62(2):397–402,
1997. Zbl 0872.11014.
550
Chapter 98
11B37 – Recurrences
3a + 1 if a is odd
f (a) =
a/2 if a is even.
Then let the sequence cn be defined as ci = f (ci−1 ), with c0 an arbitrary natural seed value.
The sequence cn is sometimes called the “hailstone sequence”. This is because it behaves
analogously to a hailstone in a cloud which falls by gravity and is tossed up again repeatedly.
The sequence similarly ends in an eternal oscillation.
A recurrence relation is a function which gives the value of a sequence at some position
based on the values of the sequence at previous positions and the position index itself. If
the current position n of a sequence s is denoted by sn , then the next value of the sequence
expressed as a recurrence relation would be of the form
551
sn+1 = f (s1 , s2 , . . . , sn−1 , sn , n)
sn+1 = sn + (n + 1)
which is the recurrence relation for the sum of the integers from 1 to n + 1. This could also
be expressed as
sn = sn−1 + n
keeping in mind that as long as we set the proper initial values of the sequence, the recurrence
relation indices can have any constant amount added or subtracted.
552
Chapter 99
The nth Fibonacci number is generated by adding the previous two. Thus, the Fibonacci
sequence has the recurrence relation
fn = fn−1 + fn−2
with f0 = 0 and f1 = 1. This recurrence relation can be solved into the closed form
1
f (n) = √ (φn − φ0 n )
5
Where φ is the golden ratio (also see this entry for an explanation of φ0 .) Note that
fn+1
lim =φ
n→∞ fn
553
99.2 Hogatt’s theorem
Hogatt’s theorem states that every positive integer can be expressed as a sum of distinct
Fibonacci numbers.
For any positive integer, k ∈ Z+ , there exists a unique positive integer n so that Fn−1 < k 6
Fn . We proceed by strong induction on n. For k = 0, 1, 2, 3, the property is true as 0, 1, 2, 3
are themselves Fibonacci numbers. Suppose k > 4 and that every integer less than k is a
sum of distinct Fibonacci numbers. Let n be the largest positive integer such that Fn < k.
We first note that if k − Fn > Fn−1 then
Fn+1 > k > Fn + Fn−1 = Fn+1 ,
giving us a contradiction. Hence k − Fn 6 Fn−1 and consequently the positive integer
(k − Fn ) can be expressed as a sum of distinct Fibonnaci numbers. Moreover, this sum does
not contain the term Fn as k − Fn 6 Fn−1 < Fn . Hence,k = (k − Fn ) + Fn is a sum of distinct
Fibonacci numbers and Hogatt’s theorem is proved by induction.
The Lucas numbers are a slight variation of Fibonacci numbers. These numbers follow the
same recursion:
ln+1 = ln + ln−1
but having different initial conditions: l1 = 1, l2 = 3 leading to the sequence 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, . .
Lucas numbers hold the following property: ln = fn−1 + fn+1 where fn is the nth Fibonacci
number.
1.61803398874989484820 . . .
φ gets its rather illustrious name from the fact that the Greeks thought that a rectangle
with ratio of side lengths of about 1.6 was the most pleasing to the eye. Classical Greek
architecture is based on this premise.
554
Above: The golden rectangle; l/w = φ.
√
1+ 5
2
The value √
1− 5
2
is often called φ0 . φ and φ0 are the two roots of the recurrence relation given by the
Fibonacci sequence. The following identities hold for φ and φ0 :
1
• φ
= −φ0
• 1 − φ = φ0
1
• φ0
= −φ
• 1 − φ0 = φ
φ−1 + φ0 = φ1
which implies
φn−1 + φn = φn+1
555
Chapter 100
If a1 , a2 , . . . , a2n−1 is a set of integers, then there exists a subset ai1 , ai2 , . . . , ain of n integers
such that
ai1 + ai2 + · · · + ain ⇔ 0 (mod n).
REFERENCES
1. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996.
556
Chapter 101
0 1
1 1
< 1
0 1 1
2 1
< 2
< 1
0 1 1 2 1
3 1
< 3
< 2
< 3
< 1
0 1 1 1 2 3 1
4 1
< 4
< 3
< 2
< 3
< 4
< 1
0 1 1 1 2 1 3 2 3 4 1
5 1
< 5
< 4
< 3
< 5
< 2
< 5
< 3
< 4
< 5
< 1
Farey sequences are a singularly useful tool in understanding the convergents that appear in
continued fractions. The convergents for any irrational α can be found: they are precisely
the closest number to α on the sequences Fn .
It is also of value to look at the sequences Fn as n grows. If ab and dc are reduced representations
of adjacent terms in some Farey sequence Fn (where b, d ≤ n), then they are adjacent fractions;
their difference is the least possible:
a c 1
− = .
b d bd
a+c
Furthermore, the first fraction to appear between the two in a Farey sequence is b+d
, in
sequence Fb+d , and (as written here) this fraction is already reduced.
An alternate view of the “dynamics” of how Farey sequences develop is given by Stern-Brocot
trees.
557
Version: 4 Owner: ariels Author(s): ariels
558
Chapter 102
are the base-p expansions of m and n , then the following congruence is true :
m a0 a1 ak
⇔ ··· (mod p)
n b0 b1 bk
Note : the binomial coefficient is defined in the usual way , namely :
x x!
=
y y!(x − y)!
The binomial theorem is a formula for the expansion of (a + b)n , for n a positive integer and
a and b any two real (or complex) numbers, into a sum of powers of a and b. More precisely,
n n n n−1 n n−2 2
(a + b) = a + a b+ a b + · · · + bn .
1 2
559
For example, if n is 3 or 4, we have:
560
Chapter 103
Let Br be the rth Bernoulli periodic function. Then the rth Bernoulli number is
Br := Br (0).
Let br be the rth Bernoulli polynomial. Then the rth Bernoulli periodic function Br (x) is
defined as the periodic function of period 1 which coincides with br on [0, 1].
561
103.3 Bernoulli polynomial
b0 (x) = 1,
b0r (x) = rbr−1 (x), r > 1,
int10 br (x)dx = 0, r > 1
b0 (x) = 1
1
b1 (x) = x −
2
1
b2 (x) = x2 − x +
6
3 1
b3 (x) = x3 − x2 + x
2 2
1
b4 (x) = x4 − 2x3 + x2 −
30
..
.
Let χ be a non-trivial primitive character mod m. The generalized Bernoulli numbers Bn,χ
are given by
Xm ∞
X
teat tn
χ(a) mt = Bn,χ
a=1
e − 1 n=0 n!
They are members of the field Q(χ) generated by the values of χ.
562
Chapter 104
has cardinality at least min(p, hk − h2 + 1). This was conjectured by Erdös and Heilbronn
in 1964[1]. The first proof was given by Dias da Silva and Hamidoune in 1994.
REFERENCES
1. Paul Erdös and Hans Heilbronn. On the addition of residue classes (mod#1). Acta Arith.,
9:149–159, 1964. Zbl 0156.04801.
2. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996. Zbl 0859.11003.
563
holds if and only if
Freiman isomorphisms were introduced by Freiman in his monograph [1] to build a general
theory of set addition that is independent of the underlying group.
The number of equivalence classes under Freiman isomorphisms of order 2 is n2n(1+o(1)) [2].
REFERENCES
1. Gregory Freiman. Foundations of Structural Theory of Set Addition, volume 37 of Translations
of Mathematical Monographs. AMS, 1973. Zbl 0271.10044.
2. Sergei V. Konyagin and Vsevolod F. Lev. Combinatorics and linear alge-
bra of Freiman’s isomorphism. Mathematika, 47:39–51, 2000. Available at
http://math.haifa.ac.il/~seva/pub_list.html.
3. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996. Zbl 0859.11003.
104.3 sum-free
A set A is called sum-free if the equation a1 + a2 = a3 does not have solutions inTelements
of A. Equivalently, a set is sum-free if it is disjoint from its 2-fold sumset, i.e., A 2A = ∅.
564
Chapter 105
Sometimes a sequence of the above type is called a floor Beatty sequence, and denoted
B(f ) (α, α0), while an integer sequence
∞
(c) 0 n − α0
B (α, α ) :=
α n=1
is called a ceiling Beatty sequence.
References
[1 ] M. Lothaire, Algebraic combinatorics on words, vol. 90, Cambridge University Press,
kCambridge, 2002, ISBN 0-521-81220-8, available online at http://www-igm.univ-mlv.fr/~berstel
A collective work by Jean Berstel, Dominique Perrin, Patrice Seebold, Julien Cassaigne,
Aldo De Luca, Steffano Varricchio, Alain Lascoux, Bernard Leclerc, Jean-Yves Thibon,
Veronique Bruyere, Christiane Frougny, Filippo Mignosi, Antonio Restivo, Christophe
Reutenauer, Dominique Foata, Guo-Niu Han, Jacques Desarmenien, Volker Diekert,
Tero Harju, Juhani Karhumaki and Wojciech Plandowski; With a preface by Berstel
and Perrin. MR 1905123
565
105.2 Beatty’s theorem
{bnpc}∞
n=1 = bpc, b2pc, b3pc, . . .
∞
{bnqc}n=1 = bqc, b2qc, b3qc, . . .
where bxc denotes the floor (or greatest integer function) of x, constitute a partition of the
set of positive integers.
That is, every positive integer is a member exactly once of one of the two sequences and the
two sequences have no common terms.
We say that two sequences partition N = {1, 2, 3, . . .} if the sequences are disjoint and their
union is N.
Fraenkel’s Partition Theorem: The sequences B(α, α0) and B(β, β 0) partition N if and
only if the following five conditions are satisfied.
1. 0 < α < 1.
2. α + β = 1.
3. 0 ≤ α + α0 ≤ 1.
566
References
[1 ] Aviezri S. Fraenkel, The bracket function and complementary sets of integers, Canad.
J. Math. 21 (1969), 6–27. MR 38:3214
[2 ] Kevin O’Bryant, Fraenkel’s partition and Brown’s decomposition, arXiv:math.NT/0305133.
An integer k is a Sierpinski number if for every positive integer n, the number k2n + 1 is
composite.
That such numbers exist is amazing, and even more surprising is that there are infinitely
many of them (in fact, infinitely many odd ones). The smallest known Sierpinski number
is 78557, but it is not known whether or not this is the smallest one. The smallest number
m for which it is unknown whether or not m is a Sierpinski number is 4847.
A process for generating Sierpinski numbers using covering sets of primes can be found at
http://www.glasgowg43.freeserve.co.uk/siercvr.htm
Visit
http://www.seventeenorbust.com/
for the distributed computing effort to show that 78557 is indeed the smallest Sierpinski
number (or find a smaller one).
Similarly, a Riesel number is a number k such that for every positive integer n, the number
k2n −1 is composite. The smallest known Riesel number is 509203, but again, it is not known
for sure that this is the smallest.
105.5 palindrome
A palindrome is a number which yields itself when its digits are reversed. Some palindromes
are :
• 121
567
• 2002
• 314159951413
Clearly one can construct aribitrary-length palindromes by taking any number and append-
ing to it a reversed copy of itself or of all but the last digit.
The theorem
S is equivalent with the statement that for each integer N > 1 exactly 1 element
of {an } {bn } lies in (N, N + 1).
S
Choose N integer. Let s(N) be the number of elements of {an } {bn } less than N.
N
an < N ⇔ np < N ⇔ n <
p
So there are b Np c elements of {an } less than N and likewise b Nq c elements of {bn }.
By definition,
N
p
− 1 < b Np c < N
p
N
q
− 1 < b Nq c < N
q
and summing these inequalities gives N − 2 < s(N) < N which gives that s(N) = N − 1
since s(N) is integer.
S
The number of elements of {an } {bn } lying in (N, N + 1) is then s(N + 1) − s(N) = 1.
568
105.7 square-free sequence
The name “square-free” comes from notation: Let {s} be a sequence. Then {s, s} is also a
sequence, which we write “compactly” as {s2 }. In the rest of this entry we use a compact
notation, lacking commas or braces. This notation is commonly used when dealing with
sequences in the capacity of strings. Hence we can write {s, s} = ss = s2 .
Some examples:
• abcdabc cannot have any subsequence written in square notation, hence it is a square-
free sequence.
• ababab = (ab)3 = ab(ab)2 , not a square-free sequence.
Note that, while notationally similar to the number-theoretic sense of “square-free,” the two
concepts are distinct. For example, for integers a and b the product aba = a2 b, a square.
But as a sequence, aba = {a, b, a}; clearly lacking any commutativity that might allow us to
shift elements. Hence, the sequence aba is square-free.
A sequence s1 , s2 , . . . is superincreasing if
n−1
X
sn > si
i=1
That is, any element of the sequence is greater than all of the previous elements added
together. A commonly used superincreasing sequence is that of powers of two (si = 2i .)
P
Suppose x = ni=1 ai si . If s is a superincreasing sequence and every ai ∈ {0, 1}, then we can
always determine the ai ’s simply by knowing x (this is analogous to the fact that we can
always determine which bits are on and off in the binary bitstring representing a number,
given that number.)
569
Chapter 106
11B99 – Miscellaneous
A Lychrel number is a number which never yields a palindrome in the iterative process
of adding to itself a copy of itself with digits reversed. For example, if we start with the
number 983 we get:
In fact, it is not known if there exist any Lychrel numbers (in base 10– in base 2 for instance,
there have been numbers proven to be Lychrel numbers 1 ). The first Lychrel candidate is
196:
570
• 52514 + 41525 = 94039
• ...
This has been followed out to millions of digits, with no palindrome found in the sequence.
The following table gives the number of Lychrel candidates found within ascending ranges:
REFERENCES
1. Wade VanLandingham, 196 And Other Lychrel Numbers
2. John Walker, Three Years of Computing
A closed form function which gives the value of a sequence at index n has only one param-
eter, n itself. This is in contrast to the recurrence relation form, which can have all of the
previous values of the sequence as parameters.
The benefit of the closed form is that one does not have to calculate all of the previous values
of the sequence to get the next value. This is not too useful if one wants to print out or
utilize all of the values of a sequence up to some n, but it is very useful to get the value of
the sequence just at some index n.
There are many techniques used to find a closed-form solution for a recurrence relation.
Some are
571
• Repeated substitution. Replace each sk in the expression of sn (with k < n) with
its recurrence relation representation. Repeat again on the resulting expression, until
some pattern is evident.
• Estimate an upper bound for sn in terms of n. Then, solve for the unknowns (say there
are r unknowns) by finding the first r values of the recurrence relation and solving the
linear system formed by them and the unknowns.
• Find the characteristic equation of the recurrence relation and solve for the roots. If
the recurrence relation is not homogeneous, then you’ll have to apply a method such
as the method of undetermined coefficients.
572
Chapter 107
11C08 – Polynomials
c(P ) = gcd(a0 , a1 , . . . , an )
For any positive integer n, we define Φn (x), the nth cyclotomic polynomial, by
n
Y
Φn (x) = (x − ζn j )
j=1,
(j,n)=1
2πi
where ζn = e n , i.e. ζn is an nth root of unity.
573
107.3 height of a polynomial
Suppose that f (x) = g(x)h(x) with g(x), h(x) ∈ F [x], where F is the field of fractions of R.
A lemma of Gauss states that there exist g 0 (x), h0 (x) ∈ R[x] such that f (x) = g 0(x)h0 (x), i.e.
any factorization can be converted to a factorization in R[x].
P P P
Let f (x) = ni=0 ai xi , g 0 (x) = `j=0 bj xj , h0 (x) = m k 0
k=0 ck x be the expansions of f (x), g (x),
and h0 (x) respectively.
Let ϕ : R[x] → R/pR[x] be the natural homomorphism from R[x] to R/pR[x]. Note that
since p | ai for i < n and p - an , we have ϕ(ai ) = 0 for i < n and ϕ(ai ) = α 6= 0
n
! n
X X
i
ϕ(f (x)) = ϕ ai x = ϕ(ai )xi = ϕ(an )xn = αxn
i=0 i=0
Therefore we have αxn = ϕ(f (x)) = ϕ(g 0 (x)h0 (x)) = ϕ(g 0(x))ϕ(h0 (x)) so we must have
0 0
ϕ(g 0(x)) = βx` and ϕ(h0 (x) = γxm for some β, γ ∈ R/pR and some integers `0 , m0 .
Clearly `0 6 deg(g 0 (x)) = ` and m0 6 deg(h0 (x)) = m, and therefore since `0 m0 = n = `m,
we must have `0 = ` and m0 = m. Thus ϕ(g 0(x)) = βx` and ϕ(h0 (x)) = γxm .
574
If ` > 0, then ϕ(bi ) = 0 for i < `. In particular, ϕ(b0 ) = 0, hence p | b0 . Similarly if m > 0,
then p | c0 .
If ` > 0 and m > 0, then p | b0 and p | c0 , which implies that p2 | a0 . But this contradicts
our assumptions on f (x), and therefore we must have ` = 0 or m = 0, that is, we must have
a trivial factorization. Therefore f (x) is irreducible.
We first prove that Φn (x) ∈ Z[x]. The field extension Q(ζn ) of Q is the splitting field of the
polynomial xn − 1 ∈ Q[x], since it splits this polynomial and is generated as an algebra by a
single root of the polynomial. Since splitting fields are normal, the extension Q(ζn )/Q is a
Galois extension. Any element of the Galois group, being a field automorphism, must map
ζn to another root of unity of exact order n. Therefore, since the Galois group of Q(ζn )/Q
permutes the roots of Φn (x), it must fix the coefficients of Φn (x), so by Galois theory these
coefficients are in Q. Moreover, since the coefficients are algebraic integers, they must be in
Z as well.
Let f (x) be the minimal polynomial of ζn in Q[x]. Then f (x) has integer coefficients as well,
since ζn is an algebraic integer. We will prove f (x) = Φn (x) by showing that every root of
Φn (x) is a root of f (x). We do so via the following claim:
Claim: For any prime p not dividing n, and any primitive nth root of unity ζ ∈ C, if
f (ζ) = 0 then f (ζ p ) = 0.
This claim does the job, since we know f (ζn ) = 0, and any other primitive nth root of unity
can be obtained from ζn by successively raising ζn by prime powers p not dividing n a finite
number of times 1 .
To prove this claim, consider the factorization xn − 1 = f (x)g(x) for some polynomial g(x) ∈
Z[x]. Writing O for the ring of integers of Q(ζn ), we treat the factorization as taking place in
O[x] and proceed to mod out both sides of the factorization by any prime ideal p of O lying
over (p). Note that the polynomial xn − 1 has no repeated roots mod p, since its derivative
nxn−1 is relatively prime to xn −1 mod p. Therefore, if f (ζ) = 0 mod p, then g(ζ) 6= 0 mod p,
and applying the pth power Frobenius map to both sides yields g(ζ p) 6= 0 mod p. This means
that g(ζ p) cannot be 0 in C, because it doesn’t even equal 0 mod p. However, ζ p is a root of
xn − 1, so if it is not a root of g, it must be a root of f , and so we have f (ζ p ) = 0, as desired.
1
Actually, if one applies Dirichlet’s theorem on primes in arithmetic progressions here, it turns out that
one prime is enough, but we do not need such a sharp result here.
575
Version: 4 Owner: djao Author(s): djao
576
Chapter 108
Let d be a positive integer which is not a perfect square, and let (x, y) be a solution
√ of
2 2 x
x − dy = 1. Then y is a convergent in the simple continued fraction expansion of d.
x
√
This implies that y
is a convergent of the continued fraction of d.
577
Chapter 109
Ax + B y = C z (109.1.1)
Then A, B, and C (or any two of them) are not relatively prime.
It is clear that the famous statement known as Fermat’s last theorem would follow from this
stronger claim.
Solutions of equation (1) are not very scarce. One parametric solution is
for m ≥ 3, and a, b such that the terms are nonzero. But computerized searching brings
forth quite a few additional solutions, such as:
578
3 3 + 63 = 35
39 + 543 = 311
36 + 183 = 38
7 6 + 77 = 983
274 + 1623 = 97
2113 + 31653 = 4224
3863 + 48253 = 5794
3073 + 6144 = 52193
54003 + 904 = 6304
2173 + 56423 = 6514
2713 + 8134 = 75883
6023 + 9034 = 87293
6243 + 143523 = 3125
18623 + 577223 = 37244
22463 + 44924 = 741183
18383 + 974143 = 55144
Mysteriously, the summands have a common factor > 1 in each instance.
This conjecture is “wanted in Texas, dead or alive”. For the details, plus some additional
links, see Mauldin.
Inspired by Fermat last theorem Euler conjectured that there are no positive integer solutions
to the quartic equation
x4 + y 4 + z 4 = w 4.
This conjecture was disproved by Elkies (1988), who found an infinite class of solutions. One
of the first solutions discovered was
Bibliography
579
Version: 4 Owner: vladm Author(s): vladm
The Theorem
Fermat’s last theorem was put forth by Pierre de Fermat around 1630. It states that the
Diophantine equation (a, b, c, n ∈ N)
an + bn = cn
History
Fermat’s last theorem was actually a conjecture and remained unproved for over 300 years. It
was finally proven in 1994 by Andrew Wiles, an English mathematician working at Princeton.
It was always called a “theorem”, due to Fermat’s uncanny ability to propose true conjec-
tures. Originally the statement was discovered by Fermat’s son Clement-Samuel among
margin notes that Fermat had made in his copy of Diophantus’ Arithmetica. Fermat
followed the statement of the conjecture with the infamous teaser:
“I have discovered a truly remarkable proof which this margin is too small to contain”
Over the years, Fermat’s last theorem was proven for various sub-cases which required specific
values of n, but no direct progress was made along these lines towards a general proof. These
proofs were bittersweet victories, as each one still left an infinite number of cases unproved.
Among the big names who took a crack at the theorem are Euler, Gauss, Germaine, Cauchy,
Dirichlet, and Legendre.
The theorem finally began to yield to direct attack in the 20th century.
Proof
In 1982 Gerhard Frey conjectured that if FLT has a solution (a,b,c,n), then the elliptic curve
defined by
y 2 = x(x − an )(x + bn )
is semistable, but not modular. The above equation is known as Frey’s equation, or the Frey
curve. Ribet proved this conjecture in 1986.
The Taniyama-Shimura conjecture, which appeared in an early form in 1955, says that all
580
elliptic curves are modular. If in fact this conjecture were to be proven in the semistable
case, then it would follow that the Frey equation would be semistable and modular, hence
FLT could have no solutions.
After a flawed attempt in 1993, Wiles along with Richard Taylor successfully proved the
semistable case of the Taniyama-Shimura conjecture in 1994, hence proving Fermat’s last
theorem. The proof appears in the May 1995 Annals of Mathematics, Vol. 151, No. 3. It is
129 pages.
Speculation
Wiles’ proof rests upon the work of hundreds of mathematicians and the mathematics created
up to and including the 20th century. We cannot imagine how Fermat’s last theorem could
be proved without these advanced mathematical tools, which include group theory and
Galois theory, the theory of modular forms, Riemannian topology, and the theory of elliptic
equations.
Could Fermat, then, have possibly had a proof to his own conjecture, in the year 1630?
It doesn’t seem likely, given the requisite mathematics behind the proof as we know it.
Assuming Fermat’s teaser was truthful, and Fermat was not in error, this “paradox” has
lead some to (hopefully) jokingly attribute supernatural abilities to Fermat.
A more interesting possibility is that there is yet another proof, which is elementary and
utilizes no more knowledge than Fermat had available in his day.
Most mathematicians, however, think that Fermat was just in error. It is also possible that
he realized later that he didn’t have a solution, but of course did not ammend the margin
notes where he wrote his tantalizing statement. Still, we cannot rule out the existence of a
simpler proof, so for some, the search continues...
Further Reading
581
Chapter 110
x ⇔ a1 (mod p1 )
x ⇔ a2 (mod p2 )
..
.
x ⇔ an (mod pn )
P
yi ⇔ 1 (mod pi )
pi
Then one solution of these congruences is
n
X P
x0 = ai yi
i=1
pi
x ⇔ x0 (mod P )
582
The Chinese remainder theorem is said to have been used to count the size of the ancient
Chinese armies (i.e., the soldiers would split into groups of 3, then 5, then 7, etc, and the
“leftover” soldiers from each grouping would be counted).
a ⇔ b (mod p)
a ⇔ b (mod q)
gcd(p, q) = 1
then
a ⇔ b (mod pq)
We know that for some k ∈ Z, a − b = kp; likewise, for some j ∈ Z, a − b = jq, so kp = jq.
Therefore kp − jq = 0.
b
x = x0 + n
d
a
y = y0 + n
d
where n ∈ Z.
We apply this theorem to the diophantine equation kp − jq = 0. Clearly one solution of this
diophantine equation is k = 0, j = 0. Since gcd(q, p) = 1, all solutions of this equation are
given by k = nq and j = np for any n ∈ Z. So we have a − b = npq; therefore pq divides
a − b, so a ⇔ b (mod pq), thus completing the lemma.
Now, to prove the Chinese remainder theorem, we first show that yi must exist for any
natural i where 1 6 i 6 n. If
P
yi ⇔ 1 (mod pi )
pi
then by definition there exists some k ∈ Z such that
P
yi − 1 = kpi
pi
583
which in turn implies that
P
− kpi = 1
yi
pi
This is a diophantine equation with yi and k being the unknown integers. It is a well-known
theorem that a diophantine equation of the form
ax + by = c
has solutions for x and y if and only if gcd(a, b) divides c. Since pPi is the product of each pj
(j ∈ N, 1 6 j 6 n) except pi , and every pj is relatively prime to pi , pPi and pi are relatively
prime. Therefore, by definition, gcd( pPi , pi ) = 1; since 1 divides 1, there are integers k and yi
that satisfy the above equation.
P
ai yi ⇔ 0 (mod pj )
pi
P
yj ⇔1 (mod pj )
pj
so we know
P
aj yj ⇔ ai (mod pj )
pj
584
so x ⇔ x0 (mod pi ). Since congruence is transitive, x must in turn satisfy all the original
congruences.
Likewise, suppose we have some x that satisfies all the original congruences. Then, for any
pi , we know that
x ⇔ ai (mod pi )
and since
x0 ⇔ ai (mod pi )
the transitive and symmetric properties of congruence imply that
x ⇔ x0 (mod pi )
x ⇔ x0 (mod p1 p2 . . . pn )
or
x ⇔ x0 (mod P )
585
Chapter 111
(d − 2)n2 + (4 − d)n
Pd (n) =
2
for integers n ≥ 0 and d ≥ 3. A “generalized polygonal number” is any value of Pd (n) for
some integer d ≥ 3 and any n ∈ Z. For fixed d, Pd (n) is called a d-gonal or d-polygonal
number. For d = 3, 4, 5, . . ., we speak of a triangular number, a square number or a square,
a pentagonal number, and so on.
Pd (0) = 0
Polygonal numbers were studied somewhat by the ancients, as far back as the Pythagoreans,
but nowadays their interest is mostly historical, in connection with this famous result:
586
In other words, any nonnegative integer is a sum of three triangular numbers, four squares,
five pentagonal numbers, and so on. Fermat made this remarkable statement in a letter to
Mersenne. Regrettably, he never revealed the argument or proof that he had in mind. More
than a century passed before Lagrange proved the easiest case: Lagrange’s four-square theorem.
The case d = 3 was demonstrated by Gauss around 1797, and the general case by Cauchy
in 1813.
587
Chapter 112
11D99 – Miscellaneous
A Diophantine equation is an equation for which the solutions are required to be integers.
x4 + y 4 = z 4
It is easy to find real numbers x, y, z that satisfy this equation: pick any arbitrary x and
y, and you can compute a z from them. But if we require that x, y, z all be integers, it is
no longer obvious at all how to find solutions. Even though raising an integer to an integer
power yields another integer, the reverse is not true in general.
As it turns out, of course, there are no solutions to the above Diophantine equation: it is a
case of Fermat’s last theorem.
“Given a Diophantine equation with any number of unknowns and with rational integer
coefficients: devise a process, which could determine by a finite number of operations whether
the equation is solvable in rational integers.”
Note that this preceded the formal study of computing and Gödel’s incompleteness theorem,
and it is unlikely that Hilbert had anticipated a negative solution — that is, a proof that
no such algorithm is possible — but that turned out the be the case. In the 1950s and 60s,
588
Martin Davis, Julia Robinson, and Hilary Putnam showed that an algorithm to determine
the solubility of all exponential Diophantine equations is impossible.
589
Chapter 113
An inner product over a complex vector space is a positive definite Hermitian form.
590
Version: 1 Owner: djao Author(s): djao
A bilinear form B on a real or complex vector space V is positive definite if B(x, x) > 0 for
all nonzero vectors x ∈ V . On the other hand, if B(x, x) < 0 for all nonzero vectors x ∈ V ,
then we say B is negative definite.
A form which is neither positive definite nor negative definite is called indefinite.
A symmetric bilinear form is a bilinear form B which is symmetric in the two coordinates;
that is, B(x, y) = B(y, x) for all vectors x and y.
Every inner product over a real vector space is a positive definite symmetric bilinear form.
Let V be a vector space over a field k, and Q : V × V → k a symmetric bilinear form. Then
the Clifford algebra Cliff(Q, V ) is the quotient of the tensor algebra T(V ) by the relations
v ⊗ w + w ⊗ v = −2Q(v, w) ∀v, w ∈ V.
Since the above relationship is not homogeneous in the usual Z-grading on T(V ), Cliff(Q, V )
does not inherit a Z-grading. However, by reducing mod 2, we also have a Z2 -grading on
T(V ), and the relations above are homogeneous with respect to this, so Cliff(Q, V ) has a
natural Z2 -grading, which makes it into a superalgebra.
The most commonly used Clifford algebra is the case V = Rn , and Q is the standard
inner product with orthonormal basis e1 , . . . , en . In this case, the algebra is generated by
591
e1 , . . . , en and the identity of the algebra 1, with the relations
e2i = −1
ei ej = −ej ei (i 6= j)
Trivially, Cliff(R0 ) = R, and it can be seen from the relations above that Cliff(R) ∼
= C, the
2 ∼
complex numbers, and Cliff(R ) = H, the quaternions.
Cliff(C2k ) ∼
= M2k (C) Cliff(C2k+1 ) = M2k (C) ⊕ M2k (C).
592
Chapter 114
Let V be a real Hilbert space (and thus an inner product space), and let f be a continuous
linear functional on V . Then f has an associated quadratic function ϕ : V → R given by
1
ϕ(v) = kvk2 − f (v)
2
593
Chapter 115
For any natural number N > 1, define the modular group Γ0 (N) to be the following subgroup
of the group SL( 2, Z) of integer coefficient matrices of determinant 1:
a b
Γ0 (N) := ∈ SL( 2, Z) c ⇔ 0 (mod N) .
c d
Let H∗ be the subset of the Riemann sphere consisting of all points in the upper half plane
(i.e., complex numbers with strictly positive imaginary part), together with the rational numbers
and the point at infinity. Then Γ0 (N) acts on H∗ , with group action given by the operation
a b az + b
· z := .
c d cz + d
Define X0 (N) to be the quotient of H∗ by the action of Γ0 (N). The quotient space X0 (N)
inherits a quotient topology and holomorphic structure from C making it into a compact
Riemann surface. (Note: H∗ itself is not a Riemann surface; only the quotient X0 (N) is.)
By a general theorem in complex algebraic geometry, every compact Riemann surface admits
a unique realization as a complex nonsingular projective curve; in particular, X0 (N) has such
a realization, which by abuse of notation we will also denote X0 (N). This curve is defined
over Q, although the proof of this fact is beyond the scope of this entry 1 .
1
Explicitly, the curve X0 (N ) is the unique nonsingular projective curve which has function field equal
to C(j(z), j(N z)), where j denotes the elliptic modular j–function. The curve X0 (N ) is essentially the
algebraic curve defined by the polynomial equation ΦN (X, Y ) = 0 where ΦN is the modular polynomial,
with the caveat that this procedure yields singularities which must be resolved manually. The fact that ΦN
has integer coefficients provides one proof that X0 (N ) is defined over Q.
594
Taniyama-Shimura Theorem (weak form): For any elliptic curve E defined over Q,
there exists a positive integer N and a surjective algebraic morphism φ : X0 (N) → E
defined over Q.
This theorem was first conjectured (in a much more precise, but equivalent formulation) by
Taniyama, Shimura, and Weil in the 1970’s. It attracted considerable interest in the 1980’s
when Frey [2] proposed that the Taniyama-Shimura conjecture implies Fermat’s last theorem.
In 1995, Andrew Wiles [1] proved a special case of the Taniyama-Shimura theorem which
was strong enough to yield a proof of Fermat’s Last Theorem. The full Taniyama-Shimura
theorem was finally proved in 1997 by a team of a half-dozen mathematicians who, build-
ing on Wiles’s work, incrementally chipped away at the remaining cases until the full re-
sult was proved. As of this writing, the proof of the full theorem can still be found on
Richard Taylors’s preprints page.
595
REFERENCES
1. Breuil, Christophe; Conrad, Brian; Diamond, Fred; Taylor, Richard; On the modularity of
elliptic curves over Q: wild 3-adic exercises. J. Amer. Math. Soc. 14 (2001), no. 4, 843–939
2. Frey, G. Links between stable elliptic curves and certain Diophantine equations. Ann. Univ.
Sarav. 1 (1986), 1–40.
3. Wiles, A. Modular elliptic curves and Fermat’s Last Theorem. Annals of Math. 141 (1995),
443–551.
596
Chapter 116
is called the trigonometric series of the function f , or Fourier series of the function f.
597
Chapter 117
has transcendence degree greater than or equal to n. Though seemingly innocuous, a proof
of Schanuel’s conjecture would prove hundreds of open conjectures in transcendental number
theory.
117.2 period
598
117.2.1 Examples
Example 2. Any algebraic number α is a period since we use the somewhat natural definition
that integration over a 0-dimensional space is taken to mean evaluation:
α = int{α} x
117.2.2 Non-periods
117.2.3 Inclusion
With the existence of a non-period, we have the following chain of set inclusions:
Z ( Q ( Q ( P ( C,
where Q denotes the set of algebraic numbers. The periods promise to prove an interesting
and important set of numbers in that nebulous region between Q and C.
117.2.4 References
599
Chapter 118
Let E be an elliptic curve. The endomorphism ring of E, denoted End(E), is the set of
all regular maps φ : E → E such that φ(O) = O, where O ∈ E is the identity element for
the group structure of E. Note that this is indeed a ring under addition ((φ + ψ)(P ) =
φ(P ) + ψ(P )) and composition of maps.
The following theorem implies that every endomorphism is also a group endomorphism:
Theorem 8. Let E1 , E2 be elliptic curves, and let φ : E1 → E2 be a regular map such that
φ(OE1 ) = OE2 . Then φ is also a group homomorphism, i.e.
600
Example: fix d ∈ Z. Let E be the elliptic curve defined by
y 2 = x3 − dx
then this curve has complex multiplication by Q(i) (more concretely by Z(i)). Besides the
multiplication by n maps, End(E) contains a genuine new element:
(the name complex multiplication comes from the fact that we are ”multiplying” the points
in the curve by a complex number, i in this case).
REFERENCES
1. James Milne, Elliptic Curves, online course notes. http://www.jmilne.org/math/CourseNotes/math679.html
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton
University Press, Princeton, New Jersey, 1971.
601
Chapter 119
Let L ∈ R2 be a lattice in the sense of number theory, i.e. a 2-dimensional free group over
Z which generates R2 over R. Let w1 , w2 be generators of the lattice L. A set F of the form
119.2 lattice in Rn
602
L = {αw1 + βw2 | α, β ∈ Z}
603
Chapter 120
The triple scalar product of three vectors is an extension of the cross product. It is defined
as
a1 b1 c1
b c b c b c
det a2 b2 c2 = ~a · (~b × ~c) = a1 det 2 2
− a2 det 1 1
+ a3 det 1 1
b3 c3 b3 c3 b2 c2
a3 b3 c3
The determinant above is positive if the three vectors satisfy the right-hand rule and negative
otherwise. Recall that the magnitude of the cross product of two vectors is equivalent to the
area of the parallelogram they form, and the dot product is equivalent to the product of the
projection of one vector onto another with the length of the vector projected upon. Putting
these two ideas together, we can see that
|~a · (~b × ~c)| = |~b × ~c||~a| cos θ = base · height = Volume of parallelepiped
Thus, the magnitude of the triple scalar product is equivalent to the volume of the paral-
lelepiped formed by the three vectors. (A parallelepiped is a three-dimensional object where
opposing faces are parallel. An example is a brick or sheared brick.)It follows that the triple
scalar product of three coplanar or collinear vectors is then 0.
identities related to the triple scalar product:
(A~ × B)
~ ·C~ = (B
~ × C)~ ·A~ = (C
~ × A)
~ ·B ~
A~ · (B
~ × C)
~ = −A ~ · (C
~ × B)
~
The latter is implied by the properties of the cross product.
604
Chapter 121
Theorem (Dirichlet, c. 1840): For any real number θ and any integer n ≥ 1, there exist
1
integers a and b such that 1 ≤ a ≤ n and |aθ − b| ≤ n+1 .
Proof: We can suppose n ≥ 2. For each integer a in the interval [1, n], write ra = aθ − [aθ] ∈
[0, 1). Since the n + 2 numbers 0, ra , 1 all lie in the same unit interval, some two of them
1
differ (in absolute value) by at most n+1 . If 0 or 1 is in any such pair, then the other element
1
of the pair is one of the ra , and we are done. If not, then 0 ≤ rk − rl ≤ n+1 for some distinct
k and l. If k > l we have rk − rl = rk−l , since each side is in [0, 1) and the difference between
them is an integer. Similarly, if k < l, we have 1 − (rk − rl ) = rl−k . So, with a = k − l or
a = l − k respectively, we get
1
|ra − c| ≤
n+1
where c is 0 or 1, and the result follows.
The same statement, but with the weaker conclusion |aθ − b| < n1 , admits a slightly shorter
proof, and is sometimes also referred to as the Dirichlet approximation theorem. (It was that
shorter proof which made the “pigeonhole principle” famous.) Also, the theorem is sometimes
1
restricted to irrational values of θ, with the (nominally stronger) conclusion |aθ − b| < n+1 .
605
Chapter 122
For any real ξ which is not rational or quadratic irrational, there are infinitely many rational
or real quadratic irrational α which satisfy
| ξ − α |< C · H(α)−3 ,
where
C0 , if | ξ |< 1,
C=
C0 · ξ 2 , if | ξ |> 1.
160
C0 is any fixed number greater than 9
and H(α) is the height of α.
REFERENCES
1. Davenport, H. & Schmidt, M. Wolfgang: Approximation to real numbers by quadratic irra-
tionals. Acta Arithmetica XIII, 1967.
Given α, a real algebraic number of degree n 6= 1, there is a constant c = c(α) > 0 such that
for all rational numbers p/q, (p, q) = 1, the inequality
α − p > c(α)
q qn
606
holds.
holds.
holds.
holds.
holds.
Let α satisfy the equation f (α) = an αn + an−1 αn−1 + · · · + a0 = 0 where the ai are integers.
Choose M such that M > maxα−16x6α+1 |f 0(x)|.
Suppose q lies in (α − 1, α + 1) and f pq 6= 0.
p
607
n n n−1 n
f p = |a p + an−1 p q + · · · + a0 q | > 1
q qn qn
608
Chapter 123
bn + bn = an . (123.1.1)
We can now apply a recent result of Andrew√ Wiles [1], which states that there are no non-zero
integers a, b satisfying equation (2). Thus n 2 is irrational. 2
REFERENCES
1. A. Wiles, Modular elliptic curves and Fermat’s last theorem, Annals of Mathematics,
Volume 141, No. 3 May, 1995, 443-551.
2. W.H. Schultz, An observation, American Mathematical Monthly, Vol. 110, Nr. 5, May 2003.
(submitted by R. Ehrenborg).
609
123.2 e is irrational (proof )
Now let us assume that e is rational. This would mean there are two natural numbers a and
b, such that:
a
e= .
b
This yields:
b!e ∈ N.
Now we can write e using (1):
X∞
1
b!e = b! .
k=0
k!
This can also be written:
b
X ∞
X
b! b!
b!e = + .
k! k!
k=0 k=b+1
1
Since b
< 1 we conclude:
∞
X b!
0< < 1.
k!
k=b+1
We have also seen that this is an integer, but there is no integer between 0 and 1. So there
cannot exist two natural numbers a and b such that e = ab , so e is irrational.
123.3 irrational
610
a
x 6=
b
with a, b ∈ Z.
From the above we have 2|a2 , and since 2 is prime it must divide a. Now we can write
c = a/2 and
2b2 = 4c2
b2 = 2c2 .
From the above we have 2|b2 , and since 2 is prime and it must divide b.
With a little bit of work this argument can be generalized to any positive integer that is not
k
a square. Let n be such an integer, then √ there must exist a prime p such that n = p m,
where p 6 |m and k is odd. Assume that n = a/b, where a, b ∈ N and are relatively prime.
This is equivalent to
nb2 = pm kb2 = a2 .
From the fundamental theorem of arithmetic, it is clear that the maximum powers of p that
divides a2 and b2 are even. So, since m is odd, the maximum power of p that divides pm kb2
2
is also odd, and, from the above
√ equation, the same should be true for a . Hence, we have
reached a contradiction and n must be irrational.
The same argument can be generalized to even more, for example to the case of nonsquare
irreducible fractions and to higher order roots.
611
Chapter 124
The tongue-in-cheek name given to the fact that if n is a nonzero integer, then |n| > 1. This
trick is used in many transcendental number theory proofs. In fact, the hardest step of many
problems is showing that a particular integer is not zero.
Let α and β be algebraic over Q, with β irrational and α not equal to 0 or 1. Then αβ is
transcendental over Q.
This is perhaps the most useful result in determining whether a number is algebraic or
transcendental.
612
This conjecture is stronger than the six exponentials theorem.
REFERENCES
[1] Waldschmidt, Michel, Diophantine approximation on linear algebraic groups. Transcendence
properties of the exponential function in several variables. Grundlehren der Mathematis-
chen Wissenschaften [Fundamental Principles of Mathematical Sciences], 326. Springer-Verlag,
Berlin, 2000. xxiv+633 pp. ISBN 3-540-66785-7.
For the history of the six exponentials theorem, we quote briefly from [6, p. 15]:
The six exponentials theorem occurs for the first time in a paper by L. Alaoglu
and P. Erdös [1], when these authors try to prove Ramanujan’s assertion that the
quotient of two consecutive superior highly composite numbers is a prime, they
need to know that if x is a real number such that px1 and px2 are both rational
numbers, with p1 and p2 distinct prime numbers, then x is an integer. However,
this statement (special case of the four exponentials conjecture) is yet unproven.
They quote C. L. Siegel and claim that x indeed is an integer if one assumes pxi
to be rational for three distinct primes pi . This is just a special case of the six
exponentials theorem. They deduce that the quotient of two consecutive superior
highly composite numbers is either a prime, or else a product of two primes.
The six exponentials theorem can be deduced from a very general result of Th.
Schneider [4]. The four exponentials conjecture is equivalent to the first of the
613
eight problems at the end of Schneider’s book [5]. An explicit statement of the six
exponentials theorem, together with a proof, has been published independently
and at about the same time by S. Lang [2, Chapter 2] and K. Ramachandra [3,
Chapter 2]. They both formulated the four exponentials conjecture explicitly.
REFERENCES
[1] L. Alaoglu and P. Erdös, On highly composite and similar numbers. Trans. Amer. Math. Soc.
56 (1944), 448–469. Available online at www.jstor.org.
[2] S. Lang, Introduction to transcendental numbers, Addison-Wesley Publishing Co., Reading,
Mass., 1966.
[3] K. Ramachandra, Contributions to the theory of transcendental numbers. I, II. Acta Arith. 14
(1967/68), 65-72; ibid. 14 (1967/1968), 73–88.
[4] Schneider, Theodor, Ein Satz über ganzwertige Funktionen als Prinzip für Transzendenzbeweise.
(German) Math. Ann. 121, (1949). 131–140.
[5] Schneider, Theodor Einführung in die transzendenten Zahlen. (German) Springer-Verlag,
Berlin-Göttingen-Heidelberg, 1957. v+150 pp.
[6] Waldschmidt, Michel, Diophantine approximation on linear algebraic groups. Transcendence
properties of the exponential function in several variables. Grundlehren der Mathematis-
chen Wissenschaften [Fundamental Principles of Mathematical Sciences], 326. Springer-Verlag,
Berlin, 2000. xxiv+633 pp. ISBN 3-540-66785-7.
Cantor showed that, in a sense, ”almost all” numbers are transcendental, because the alge-
braic numbers are countable, whereas the transcendental numbers are not.
614
Chapter 125
Intuitively, x is normal in base b if all digits and digit-blocks in the base-b digit sequence of
x occur just as often as would be expected if the sequence had been produced completely
randomly.
We say that x is absolutely normal if it is normal in every base b ≥ 2. (Some authors use
the term ”normal” instead of ”absolutely normal”.)
Absolutely normal numbers were first defined by Émile Borel in 1909. Borel also proved that
almost all real numbers are absolutely normal, in the sense that the numbers that are not
absolutely normal form a set with Lebesgue measure zero. However, for any base b, there
are uncountably many numbers that are not normal in base b.
Champernowne’s number
0.1234567891011121314...
(obtained by concatentating the decimal expansions of all natural numbers) is normal in
base 10, but not absolutely normal.
Few absolutely normal numbers are known. The first one was constructed by Sierpinski in
615
1916, and a related construction led to a computable absolutely normal number in 2002.
Maybe the most prominent absolutely normal number is Chaitin’s constant Ω, which is not
computable.
No rational number can be normal in any base. Beyond that, it is extremely hard to prove
or disprove normality of a given constant. For instance, it has been conjectured that all
irrational algebraic numbers are absolutely normal since no counterexamples are known; on
the other hand, not a single irrational algebraic number has been proven normal in any base.
Likewise, it is conjectured that the transcendental constants π, e, and ln(2) are normal, and
this is supported by some empirical evidence, but a proof is out of reach. We don’t even
know which digits occur infinitely often in the decimal expansion of π.
616
Chapter 126
The most widely used and best understood pseudorandom generator is the Lehmer multiplicative
congruential generator, in which each number r is calculated as a function of the preceding
number in the sequence:
ri = [ari−1 ] (mod#1)
or
ri = [ari−1 + c] (mod#1)
where a and c are carefully chosen constants, and m is usually a power of two, 2k . All
quantities appearing in the formula (except m) are integers of k bits. The expression in
brackets is an integer of length 2k bits, and the effect of the modulo (mod#1) is to mask
off the most significant part of the result of the multiplication. r0 is the seed of a generation
sequence; many generators allow one to start with a different seed for each run of a program,
to avoid re-generating the same sequence, or to preserve the seed at the end of one run for
the beginning of a subsequent one. Before being used in calculations, the ri are usually
transformed to floating point numbers normalized into the range [0, 1]. Generators of this
type can be found which attain the maximum possible period of 2k−2 , and whose sequences
pass all reasonable tests of “randomness”, provided one does not exhaust more than a few
617
percent of the full period.
References
Because the way of generating and using them is quite different, one must distinguish between
finite and infinite quasirandom sequences.
References
Kuipers74 L. Kuipers and H. Niederreiter, Uniform Distribution of Sequences, Wiley, New York,
1974.
Zaremba72 S.K. Zaremba (Ed.), Applications of Number Theory to Numerical Analysis, Academic
Press, New York, 1972.
618
Braaten79 E. Braaten and G. Weller, An Improved Low-discrepancy Sequence for Multidimen-
sional Quasi-Monte Carlo Integration, J. Comp. Phys. 33 (1979) 249.
Random numbers are particular occurrences of random variables. They are necessary for
Monte Carlo calculations as well as many other computerized processes. There are three
kinds of random numbers:
Truly random numbers can only be generated by a physical process and cannot be generated
via software. This makes it rather clumsy to use them in Monte Carlo calculations, since
they must be first generated in a separate device and either sent to the computer or recorded
(for example on removable storage media) for later use in calculations. Traditionally, tapes
containing millions of random numbers generated using radioactive decay were available from
laboratories.
Nowadays, standard digital computers often have provisions for obtaining truly random
numbers, that is, numbers generated by a physical process. Fir instance, Intel has provided
a function since their i810 chipsets which utilizes noise in a particularly prone semiconductor
as a source of randomness. Often times it is possible to use other incidental physical noise as
a source; for example, static on the the input channel of a sound card. In addition, peripheral
devices (add-ons) to personal computers exist which provide truly random numbers, when
the previous methods fail.
References
619
Chapter 127
where r(n) is some reduced residue system mod n, meaning any subset of Z containing
exactly one element of each invertible residue class mod n.
which shows that cs (n) is a real number, and indeed an integer. In particular cs (1) = µ(s).
More generally,
cst (mn) = cs (m)ct (n) if (m, t) = (n, s) = 1 .
620
Using the Chinese remainder theorem, it is not hard to show that for any fixed n, the function
s 7→ cs (n) is multiplicative:
Some writers use different notation from ours, reversing the roles of s and n in the expression
cs (n).
621
Chapter 128
Let p be a prime. Let χ be any multiplicative group character on Z/pZ (that is, any
group homomorphism of multiplicative groups (Z/pZ)× → C× ). For any a ∈ Z/pZ, the
complex number X
ga (χ) := χ(t)e2πiat/p
t∈Z/pZ
In general, the equation ga (χ) = χ(a−1 )g1 (χ) (for nontrivial a and χ) reduces the compu-
√
tation of general Gauss sums to that of g1 (χ). The absolute value of g1 (χ) is always p as
long
as χ is nontrivial, and if χ is a quadratic character (that is, χ(t) is the Legendre symbol
t
p
), then the value of the Gauss sum is known to be
(√
p, p ⇔ 1 (mod 4),
g1 (χ) = √
i p, p ⇔ 3 (mod 4).
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory, Second
Edition, Springer–Verlag, 1990.
622
128.2 Kloosterman sum
The Kloosterman sum is one of various trigonometric sums that are useful in number theory
and, more generally, in finite harmonic analysis. The original Kloosterman sum is
X
2πi(ax + bx−1 )
Kp (a, b) = exp
x∈F ∗
p
p
where Fp is the field of prime order p. Such sums have been generalized in a few different
ways since their introduction in 1926. For instance, let q be a prime power, Fq the field of q
elements, χ : F∗q → C a character, and ψ : Fq → C a mapping such that ψ(x + y) = ψ(x)ψ(y)
identically. The sums X
Kψ (χ|a, b) = χ(x)ψ(ax + bx−1 )
x∈F∗q
Kloosterman sums are finite analogs of the K-Bessel functions of this kind:
1 ∞ s−1 −a(x + x−1 )
Ks (a) = int0 x exp dx
2 2
where Re(a) > 0.
The Landsberg-Schaar relation states that for any positive integers p and q:
p−1 2q−1
1 X 2πin2 q eπi/4 X πin2 p
√ exp = √ exp − (128.3.1)
p n=0 p 2q n=0 2q
Although both sides of (1) are mere finite sums, no one has yet found a proof which uses no
infinite limiting process. One way to prove it is to put τ = 2iq/p + , where > 0, in this
identity due to Jacobi:
+∞
X +∞
−πn2 τ 1 X −πn2 /τ
e = √ e (128.3.2)
n=−∞
τ n=−∞
and let → 0. The details can be found in various works on harmonic analysis, such as [1].
The identity (2) is a basic one in the theory of theta functions. It is sometimes called the
functional equation for the Riemann theta function. See e.g. [2 VII.6.2].
If we just let q = 1 in the Landsberg-Schaar identity, it reduces to a formula for the quadratic
Gauss sum mod p; notice that p need not be prime.
623
References:
[1] H. Dym and H.P. McKean. Fourier series and Integrals. Academic Press, 1972.
The Gauss sum can be easily evaluated up to a sign by squaring the original series
X s X t
2 2πis/p
g1 (χ) = e e2πit/p
p p
s∈Z/pZ t∈Z/pZ
X
st 2πi(s+t)/p
= e
p
s,t∈Z/pZ
X n
= e2πi(s+ns)
p
s,n∈Z/pZ
X n X
= e2πis(n+1)
p
n∈Z/pZ s∈Z/pZ
X n
= (q[n ⇔ −1 (mod p)] − 1)
p
n∈Z/pZ
X n
−1
=p −
p p
n∈Z/pZ
(
−1 p, if p ⇔ 1 (mod 4),
=p =
p −p, if p ⇔ 3 (mod 4).
REFERENCES
1. Harold Davenport. Multiplicative Number Theory. Markham Pub. Co., 1967. Zbl 0159.06303.
624
Chapter 129
X
m+n
t
p−1 m+n p−1
1XXX
t
p−1 m+n p−1
1 X X 2πiax/p X t
2πia(x−t)/p
= e = e e−2πiat/p
t=m
p p t=0 x=m a=0 p p a=1 x=m t=0
p
Pp−1 t −2πiat/p √
The expression t=0 ( p )e is just a Gauss sum, and has magnitude p . Hence
Here hxi denotes the absolute value of the difference between x and the closest integer to x,
i.e. hxi = inf z∈Z {|x − z|}.
625
p−1
p−1
1X 1 X p X2
1
= =p
2 a=1 ha/pi p a a=1
a
0<a< 2
2x+1
Now ln 2x−1 > x1 for x > 1; to prove this, it suffices to show that the function f : [1, ∞) → R
given by f (x) = x ln 2x+1
2x−1
is decreasing and approaches 1 as x → ∞. To prove the latter
statement, substitute v = 1/x and take the limit as v → 0 using L’Hopital’s rule. To prove
the former statement, it will suffice to show that f 0 is less than zero on the interval [1, ∞).
4x2 +1
But f 0 (x) → 0 as x → ∞ and f 0 is increasing on [1, ∞), since f 00 (x) = 4x−42 −1 (1 − 4x2 −1 ) > 0
√ p−1 p−1
m+n
X t p X 2
1 √ X 2
2a + 1 √
6 ·p < p ln = p ln p.
t=m
p p a=1
a a=1
2a − 1
REFERENCES
1. Vinogradov, I. M., Elements of Number Theory, 5th rev. ed., Dover, 1954.
626
Chapter 130
The number
X∞
1
ζ(3) =
n=1
n3
= 1.202056903159594285399738161511449990764986292 . . .
has been called Apéry’s constant since 1979, when Roger Apéry published a remarkable proof
that it is irrational [1].
REFERENCES
1. Roger Apéry. Irrationalité de ζ(2) et ζ(3). Astérisque, 61:11–13, 1979.
2. Alfred van der Poorten. A proof that Euler missed. Apéry’s proof of the irrationality of ζ(3).
An informal report. Math. Intell., 1:195–203, 1979.
Let K be a number field with ring of integers OK . Then the Dedekind zeta function of K is
the analytic continuation of the following series:
X
ζK (s) = (NQK (I))−s
I⊂OK
627
where I ranges over non-zero ideals of OK , and NQK (I) = |OK : I| is the norm of I.
This converges for Re(s) > 1, and has a meromorphic continuation to the whole plane, with
a simple pole at s = 1, and no others.
where p ranges over prime ideals of OK . The Dedekind zeta function of Q is just the
Riemann zeta function.
It converges absolutely and uniformly in the domain Re(s) > 1 + δ for any positive δ, and
admits the Euler product identity
Y 1
L(χ, s) = (130.3.2)
p
1 − χ(p)p−s
where the product is over all primes p, by virtue of the multiplicativity of χ. In the case
where χ = χ0 is the trivial character mod m, we have
Y
L(χ0 , s) = ζ(s) (1 − p−s ), (130.3.3)
p|m
where ζ(s) is the Riemann zeta function. If χ is non-primitive, and Cχ is the conductor of
χ, we have Y
L(χ, s) = L(χ0, s) (1 − χ(p)p−s ), (130.3.4)
p|m
p-Cχ
where χ0 is the primitive character which induces χ. For non-trivial, primitive characters
χ mod m, L(χ, s) admits an analytic continuation to all of C and satsfies the symmetric
functional equation
m s/2 s + e g1 (χ) m 1−s
1 − s + eχ
χ −1 2
L(χ, s) Γ = eχ √ L(χ , 1 − s) Γ . (130.3.5)
π 2 i m π 2
628
Here, eχ ∈ {0, 1} is defined by χ(−1) = (−1)eχ χ(1), Γ is the gamma function, and g1 (χ)
is a Gauss sum. (3),(4), and (5) combined show that L(χ, s) admits a (3)meromorphic
(3)continuation to all of C for all Dirichlet characters χ, and an analytic one for (3)non-
trivial χ. Again assuming that χ is non-trivial and primitive character mod m, if k is a
positive integer, we have
Bk,χ
L(χ, 1 − k) = − , (130.3.6)
k
where Bk,χ is a generalized Bernoulli number. By (5), taking into account the poles of Γ, we
get for k positive, k ⇔ eχ mod 2,
k
1+ 2 χ g1 (χ)
k−e 2π Bk,χ−1
L(χ, k) = (−1) e
. (130.3.7)
2i χ m k!
This series was first investigated by (duh) Dirichlet, who used the non-vanishing of L(χ, 1) for
non-trivial χ to prove his famous Dirichlet’s theorem on primes in arithmetic progression.
This is probably the first instance of using complex analysis to prove a purely number
theoretic result.
The Riemann theta function is a number theoretic function which is only really used in the
derivation of the functional equation for the Riemann Xi function.
It is defined as:
θ(x) = 2ω(x) + 1
so that:
P−∞ 2 πx P−∞ −n2 πx
P∞ 2 πx 2 πx P∞ 2 πx
2ω(x)+1 = n=−1 e−n +ω(x)+1 = n=−1 e + n=1 e−n +e−0 = n=−∞ e−n
P∞ 2 πx
θ(x) = n=−∞ e−n
Riemann showed that the theta function satisfied a functional equation, which was the key
step in the proof of the analytic continuation for the Riemann Xi function; this has direct
consequences of course with the Riemann zeta function.
629
Version: 5 Owner: kidburla2003 Author(s): kidburla2003
The Xi function is the function which is the key to the functional equation for the Riemann zeta function.
It is defined as:
1
Ξ(s) = π − 2 s Γ( 12 s)ζ(s)
Riemann himself used the notation of a lower case xi (ξ). The famous Riemann hypothesis is
equivalent to the assertion that all the zeros of ξ are real, in fact Riemann himself presented
his original hypothesis in terms of that function.
The Riemann Omega function is used in the proof of the analytic continuation for the
Riemann Xi function to the whole complex plane. It is defined as:
P
ω(x) = ∞n=1 e
−n2 πx
The Omega function satisfies a functional equation also, which can be easily derived from
the theta functional equation.
The Riemann Xi function satisfies a functional equation, which directly implies the Riemann
Zeta function’s functional equation. The proof depends on the Riemann Theta function.
Ξ(s) = Ξ(1 − s)
You can see from the definition that the Xi function is not defined for Re(s) < 1, since the
Riemann zeta function is only defined for Re(s) > 1, so this is an important theorem for
630
the Zeta function (in fact, there are no zeros with real part greater than 1, so without this
functional equation the study of the zeta function would be very limited).
The lemma used in the derivation of the functional equation for the Riemann Xi function.
This functional equation is not as remarkable as the one for the Xi function, because it does
not actually extend the domain of the function.
√
θ( x1 ) = xθ(x)
The proof relies on the Cauchy integral formula and the Poisson summation formula.
This generalization of the Riemann hypothesis to arbitrary Dedekind zeta functions states
that for any number field K, the only zeroes s of the Dedekind zeta function ζK (s) that lie
in the strip 0 6 Re s 6 1 satisfy Re s = 21 .
All sums are over all integers unless otherwise specified. Thus the Riemann theta function
is X 2
θ(x) = e−πn x .
n
2
Now, we wish to apply Poission P
P summation to f (x, y) = e−πxy , in the variable y. θ(x) =
∨
n f (x, n), and thus is equal to n f (x, n) where
2 +2iny)
f ∨ (x, n) = intR f (x, y)e2πiny dy = intR eπ(−xy dy.
631
(OK, I thought I knew this one, but I’m stuck, and have to go. I’ll finish it later).
632
Chapter 131
11M99 – Miscellaneous
131.1.1 Definition
The Riemann zeta function is defined to be the complex valued function given by the series
X∞
1
ζ(s) := , (131.1.1)
n=1
ns
which is valid (in fact, absolutely convergent) for all complex numbers s with Re(s) > 1. We
list here some of the key properties [1] of the zeta function.
1. For all s with Re(s) > 1, the zeta function satisfies the Euler product formula
Y 1
ζ(s) = , (131.1.2)
p
1 − p−s
where the product is taken over all positive integer primes p, and converges uniformly
in a neighborhood of s.
2. The zeta function has a meromorphic continuation to the entire complex plane with a
simple pole at s = 1, of residue 1, and no other singularities.
633
131.1.2 Distribution of primes
The Euler product formula (131.1.2) given above expresses the zeta function as a product
over the primes p ∈ Z, and consequently provides a link between the analytic properties of
the zeta function and the distribution of primes in the integers. As the simplest possible
illustration of this link, we show how the properties of the zeta function given above can be
used to prove that there are infinitely many primes.
If the set S of primes in Z were finite, then the Euler product formula
Y 1
ζ(s) =
p∈S
1 − p−s
would be a finite product, and consequently lims→1 ζ(s) would exist and would equal
Y 1
lim ζ(s) = .
s→1
p∈S
1 − p−1
But the existence of this limit contradicts the fact that ζ(s) has a pole at s = 1, so the set
S of primes cannot be finite.
A more sophisticated analysis of the zeta function along these lines can be used to prove both
the analytic prime number theorem and Dirichlet’s theorem on primes in arithmetic progressions
1
. Proofs of the prime number theorem can be found in [2] and [5], and for proofs of Dirichlet’s
theorem on primes in arithmetic progressions the reader may look in [2] and [1].
A nontrivial zero of the Riemann zeta function is defined to be a root ζ(s) = 0 of the zeta
function with the property that 0 6 Re(s) 6 1. Any other zero is called trivial zero of the
zeta function.
The reason behind the terminology is as follows. For complex numbers s with real part
greater than 1, the series definition (131.1.1) immediately shows that no zeros of the zeta
function exist in this region. It is then an easy matter to use the functional equation (131.1.3)
to find all zeros of the zeta function with real part less than 0 (it turns out they are exactly
the values −2n, for n a positive integer). However, for values of s with real part between 0
and 1, the situation is quite different, since we have neither a series definition nor a functional
equation to fall back upon; and indeed to this day very little is known about the behavior
of the zeta function inside this critical strip of the complex plane.
It is known that the prime number theorem is equivalent to the assertion that the zeta
function has no zeros s with Re(s) = 0 or Re(s) = 1. The celebrated Riemann hypothesis
1
In the case of arithmetic progressions, one also needs to examine the closely related Dirichlet L–functions
in addition to the zeta function itself.
634
asserts that all nontrivial zeros s of the zeta function satisfy the much more precise equation
Re(s) = 1/2. If true, the hypothesis would have profound consequences on the distribution
of primes in the integers [5].
635
REFERENCES
1. Lars Ahlfors, Complex Analysis, Third Edition, McGraw–Hill, Inc., 1979.
2. Joseph Bak & Donald Newman, Complex Analysis, Second Edition, Springer–Verlag, 1991.
3. Gerald Janusz, Algebraic Number Fields, Second Edition, American Mathematical Society, 1996.
4. Serge Lang, Algebraic Number Theory, Second Edition, Springer–Verlag, 1994.
5. Stephen Patterson, Introduction to the Theory of the Riemann Zeta Function, Cambridge Uni-
versity Press, 1988.
6. B. Riemann, Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse,
http://www.maths.tcd.ie/pub/HistMath/People/Riemann/Zeta/
7. Jean–Pierre Serre, A Course in Arithmetic, Springer–Verlag, 1973.
Let us use the traditional notation s = σ + it for the complex variable, where σ and t are
real numbers.
X∞
1
ζ(s) = (−1)n+1 n−s σ>0 (131.2.1)
1 − 21−s n=1
1 x − [x]
ζ(s) = + 1 − sint∞
1 dx σ>0 (131.2.2)
s−1 xs+1
1 1 ((x))
ζ(s) = + − sint∞ 1 dx σ > −1 (131.2.3)
s−1 2 xs+1
where [x] denotes the largest integer ≤ x, and ((x)) denotes x − [x] − 12 .
We will prove (131.2.2) and (131.2.3) with the help of this useful lemma:
636
then the lemma will follow by summing a finite sequence of cases of (131.2.4). The integral
in (131.2.4) is
tdt
int10 = int10 (u + t)−s dt − int10 u(u + t)−s−1 dt
(u + t)s+1
(u + 1)1−s − u1−s u [(u + 1)−s − u−s ]
= +
1−s s
so the right side of (131.2.4) is
−s u1−s (u + 1)1−s
(u + 1)1−s − u1−s − u (u + 1)−s − u−s +
1−s 1−s 1−s
−s −s(u + 1) u+1 −s us u
= (u + 1) −u+ +u +u−
1−s 1−s 1−s 1−s
= (u + 1)−s · 1 + u−s · 0
and the lemma is proved.
Now take u = 1 and let v → ∞ in the lemma, showing that (131.2.2) holds for σ > 1. By
the principle of analytic continuation, if the integral in (131.2.2) is analytic for σ > 0, then
(131.2.2) holds for σ > 0. But x − [x] is bounded, so the integral converges uniformly on
σ ≥ for any > 0, and the claim (131.2.2) follows.
We have
1 1
sint∞1 x
−1−s
dx =
2 2
Adding and subtracting this quantity from (131.2.2), we get (131.2.3) for σ > 0. We need
to show that
((x))
int∞
1 dx
xs+1
is analytic on σ > −1. Write
f (y) = inty1 ((x))dx
and integrate by parts:
((x)) f (x)
int∞
1 s+1
dx = lim f (x)x−1−s − f (1)x−1−1 + (s + 1)int∞
1 dx
x x→∞ xs+2
The first two terms on the right are zero, and the integral converges for σ > −1 because f
is bounded.
Using formula (131.2.3), one can verify Riemann’s functional equation in the strip −1 < σ <
2. By analytic continuation, it follows that the functional equation holds everywhere. One
way to prove it in the strip is to decompose the sawtooth function ((x)) into a Fourier series,
and do a termwise integration. But the proof gets rather technical, because that series does
not converge uniformly.
637
131.3 functional equation of the Riemann zeta func-
tion
Let Γ denote the gamma function, ζ the Riemann zeta function and s any complex number.
Then
s
s−1 1−s −s
π 2 Γ ζ(1 − s) = π 2 Γ ζ(s).
2 2
Though the equation appears too intricate to be of any use, the inherent symmetry in the
formula makes this the simplest method of evaluating ζ(s) at points to the left of the critical
strip.
Here we present an application of Parseval’s equality to number theory. Let ζ(s) denote the
Riemann zeta function. We will compute the value
ζ(2)
Example:
The Fourier series of this function has been computed in the entry examples of Fourier sery.
Thus
∞
X
f (x) = x = af0 + (afn cos(nx) + bfn sin(nx))
n=1
X∞
2
= (−1)n+1 sin(nx), ∀x ∈ (−π, π)
n=1
n
638
X f ∞
1 π 2
int−π f (x)dx = 2(af0 )2 + [(ak )2 + (bfk )2 ]
π
k=1
and
1 π 2 1 2π 2
int−π f (x)dx = intπ−π x2 dx =
π π 3
639
Chapter 132
Bertrand conjectured that for every positive integer n > 1, there exists at least one prime
p satisfying n < p < 2n. This result was proven in 1850 by Chebyshev, but the name
”Bertrand’s Conjecture” remains in the literature.
Viggo Brun proved that the constant exists by using a new sieving method, which later
became known as Brun’s sieve.
640
Definition.
X
Θ(n) = log p
p6n
p prime
Proof. By induction.
For even n > 2, the case follows immediately from the case for n − 1 since n isn’t prime.
Now we can deal with the main theorem. Suppose n > 2 and there is no prime p with
n < p < 2n.
Consider (1 + 1)2n . Since 2n is the largest term in the binomial expansion that has 2n + 1
2n
4n
n
terms, n > 2n+1 .
For a prime p define r(p, n) to be the highest power of p dividing 2n n
. We first look at the
highest power of p dividing n!. Every p-th term contributes a factor, so we have already [ np ]
factors, where [x] is the integer part of x. However, every p2 -th term contributes an extra
3
factor above that, P nand every p -th term one more and so on. So the highest power of p
dividing n! is j [ pj ]. Now for r(p, n) we need to take the factors contributed by (2n)! and
P
subtract twice the factors taken away by n!. This leads us to r(p, n) = j ([ 2n pj
] − 2[ pnj ]). Now
each of these terms is either 0 or 1 (as is every value of [2x] − 2[x]), and the terms vanish for
j > [ log(2n)
log p
], so r(p, n) 6 [ log(2n)
log p
] or pr(p,n) 6 2n.
Q r(p,n)
Now 2n n
= pp . By the previous inequality, primes larger than 2n do not contribute
to this product and by assumption there are no primes between n and 2n. So
Y
2n
= pr(p,n) .
n 16p6n
p prime
√
If p > 2n, all the terms for higher powers of p vanish and r(p, n) = [ 2n
p
] − 2[ np ].
√
For n > p > 2n3
, 32 > np > 1 and so for p > 2n
3
> 2n we can apply the previous formula for
r(p, n) and find that it’s zero. So for all n > 4, the contribution of the primes larger than
2n
3
is zero.
641
√
Now for p√> 2n, r(p, n) is at most 1, so an upper bound for the contribution for the primes
2n
between 2n and 2n is the product of all primes smaller than 2n which equals eΘ( 3 ) . There
√ 3 √ 3
are at most 2n√primes smaller than 2n and by the inequality pr(p,n) 6 2n their product
is less than (2n) 2n .
Two consecutive odd numbers which are both prime are called twin primes, e.g. 5 and 7, or
41 and 43, or 1,000,000,000,061 and 1,000,000,000,063. But is there an infinite number of
twin primes ?
642
Chapter 133
643
Chapter 134
(x21 +x22 +x23 +x24 )(y12 +y22 +y32 +y42) = (x1 y1 +x2 y2 +x3 y3 +x4 y4 )2 +(x1 y2 −x2 y1 +x3 y4 −x4 y3 )2
+(x1 y3 − x3 y1 + x4 y2 − x2 y4 )2 + (x1 y4 − x4 y1 + x2 y3 − x3 y2 )2
It may be derived from the property of quaternions that the norm of the product is equal to
the product of the norms.
644
Chapter 135
We call n a highly composite number if d(n) > d(m) for all m < n, where d(n) is the number
of divisors of n. The first several are 1, 2, 4, 6, 12, 24. The sequence is A002182 in Sloane’s
encyclopedia.
The integer n is superior highly composite if there is an > 0 such that for all m 6= n,
The first several superior highly composite numbers are 2, 6, 12, 60, 120, 360. The sequence
is A002201 in Sloane’s encyclopedia.
REFERENCES
[1] L. Alaoglu and P. Erdös, On highly composite and similar numbers. Trans. Amer. Math. Soc.
56 (1944), 448–469. Available at www.jstor.org
645
Chapter 136
11N99 – Miscellaneous
This has the slightly weaker consequence that given a system of congruences x ∼
= ai (mod Ii ),
there is a solution in R which is unique mod I, as the theorem is usually stated for the
integers.
646
and hence the expression on the right hand side must equal R.
Q T
Now we can prove that ai = ai , by induction. The statement is trivial for n = 1. For
n = 2, note that
\ \ \
a1 a2 = (a1 a2 )R = (a1 a2 )(a1 + a2 ) ⊆ a2 a1 + a1 a2 = a1 a2 ,
and the reverse inclusion is obvious, since each ai is an ideal. Assume that the statement is
proved for n − 1, and condsider it for n. Then
n
\ n
\\ n
\Y
ai = a1 ai = a1 ai ,
1 2 2
using the induction hypothesis in the last step. But using the fact proved above and the
n = 2 case, we see that
\Y n n
Y n
Y
a1 ai = a1 · ai = ai .
2 2 1
Finally, Qwe are ready to prove the Chinese remainder theorem. Consider the ring homomorphism
R → R/ai defined by projection on each component of the T product: x 7→ (aQ
1 + x, a2 +
x, . . . , an + x). It is easy to see that the kernel of this map is ai , which is also ai by the
earlier part of the proof. So it only remains to show that the map is surjective.
Q
Accordingly, take an arbitrary element (a1 + x1 , a2 + x2 , . . . , an + xn ) of QR/ai . Using the
first part of the proof, for each i, we can find elements yi ∈ ai and zi ∈ j6=i aj such that
yi + zi = 1. Put
x = x1 z1 + x2 z2 + · · · + xn zn .
Then for each i,
ai + x = ai + xi zi ,
since xj zj ∈ ai for all j 6= i,
= ai + xi yi + xi zi ,
since xi yi ∈ ai ,
= ai + xi (yi + zi ) = ai + xi · 1 = ai + xi .
Thus the map is surjective as required, and induces the isomporphism
R YR
Q → .
ai ai
647
Chapter 137
Lagrange’s four-square theorem states that every non-negative integer may be expressed as
the sum of at most four squares. By the Euler four-square identity, it is enough to show that
every prime is expressible by at most four squares. It was later proved that any number not
of the form 4n (8m + 7) may be expressed as the sum of at most three squares.
This shows that g(2) = G(2) = 4, where g and G are the Waring functions.
Waring asked whether it is possible to represent every natural number as a sum of bounded
number of nonnegative k’th powers, that is, whether the set { nk | n ∈ Z+ } is a basis. He
was led to this conjecture by Lagrange’s theorem which asserted that every natural number
can be represented as a sum of four squares.
Hilbert [1] was the first to prove the Waring’s problem for all k. In his paper he did not give
an explicit bound on g(k), the number of powers needed, but later it was proved that
$ %
k
k 3
g(k) = 2 + −2
2
except possibly finitely many exceptional k, none of which are known to the date.
648
Wooley[3], improving the result of Vinogradov, proved that the number of k’th powers needed
to represent all sufficiently large integers is
G(k) 6 k(ln k + ln ln k + O(1)).
REFERENCES
1. David Hilbert. Beweis für Darstellbarkeit der ganzen Zahlen durch eine feste Anzahl n-ter
Potenzen (Waringsches Problem). Math. Ann., pages 281–300, 1909. Available electronically
from GDZ.
2. Robert C. Vaughan. The Hardy-Littlewood method. Cambridge University Press, 1981.
Zbl 0868.11046.
3. Trevor D. Wooley. Large improvements in Waring’s problem. Ann. Math, 135(1):131–164, 1992.
Zbl 0754.11026. Available online at JSTOR.
Proof: Say 2m = x2 + y 2 . Then x and y are both even or both odd. Therefore, in the
identity x − y 2 x + y 2
m= + ,
2 2
both fractions on the right side are integers.
By Lemma 1 we need only show that an arbitrary prime p is a sum of four squares. Since
that is trivial for p = 2, suppose p is odd. By Lemma 3, we know
mp = a2 + b2 + c2 + d2
for some m, a, b, c, d with 0 < m. To complete the proof, we will show that if m > 1 then np
is a sum of four squares for some n with 1 ≤ n < m.
If m is even, then none, two, or all four of a, b, c, d are even; in any of those cases, Lemma 2
allows us to take n = m/2. So assume m is odd but > 1. Write
w ⇔ a (mod m)
x ⇔ b (mod m)
y ⇔ c (mod m)
z ⇔ d (mod m)
where w, x, y, z are all in the interval (−m/2, m/2). We have
m2
w 2 + x2 + y 2 + z 2 < 4 = m2
4
w 2 + x2 + y 2 + z 2 ⇔ 0 (mod m).
So w 2 + x2 + y 2 + z 2 = nm for some integer n with 0 < n < m. But now look at Lemma 1.
On the left is nm2 p. Evidently these three sums:
ax − bw − cz + dy
ay + bz − cw − dx
az − by + cx − dw
are multiples of m. The same is true of the other sum on the right in Lemma 1:
aw + bx + cy + dz ⇔ w 2 + x2 + y 2 + z 2 ⇔ 0 (mod m).
The equation in Lemma 1 can therefore be divided through by m2 . The result is an expression
for np as a sum of four squares. Since 0 < n < m, the proof is complete.
Remark: Lemma 3 can be improved: it is enough for p to be an odd number, not necessarily
prime. But that stronger statement requires a longer proof.
650
Chapter 138
Theorem : ∞ ∞
Y X
(1 − xk ) = (−1)n xn(3n+1)/2 (138.1.1)
k=1 n=−∞
where the two sides are regarded as formal power series over Z.
Proof:For n ≥ 0, denote by f (n) the coefficient of xn in the product on the left, i.e. write
∞
Y ∞
X
k
(1 − x ) = f (n)xn .
k=1 n=0
651
Therefore what we want to prove is
e(n) = d(n) + (−1)m if n = m(3m ± 1)/2 for some m (138.1.2)
e(n) = d(n) otherwise. (138.1.3)
For m ≥ 1 we have
m(3m + 1)/2 = 2m + (2m − 1) + . . . + (m + 1) (138.1.4)
m(3m − 1)/2 = (2m − 1) + (2m − 2) + . . . + m (138.1.5)
Take some (s, g) ∈ P (n), and suppose first that n is not of the form (138.1.4) nor (138.1.5).
Since g is decreasing, there is a unique integer k ∈ [1, s] such that
g(j) = g(1) − j + 1 for j ∈ [1, k], g(j) < g(1) − j + 1 for j ∈ [k + 1, s].
If g(s) ≤ k, define g : [1, s − 1] → N+ by
(
g(x) + 1, if x ∈ [1, g(s)],
g(x) =
g(x), if x ∈ [g(s) + 1, s − 1].
If g(s) > k, define g : [1, s + 1] → N+ by
g(x) − 1, if x ∈ [1, k],
g(x) = g(x), if x ∈ [k + 1, s],
k, if x = s + 1.
P
In both cases, g is decreasing and x g(x) = n. The mapping g → g maps takes an element
having odd s to an element having even s, and vice versa. Finally, the reader can verify that
g = g. Thus we have constructed a bijection E(n) → D(n), proving (138.1.3).
Now suppose that n = m(3m + 1)/2 for some (perforce unique) m. The above construction
still yields a bijection between E(n) and D(n) excluding (from one set or the other) the
single element (m, g0 ):
g0 (x) = 2m + 1 − x for x ∈ [1, m]
as in (138.1.4). Likewise if n = m(3m − 1)/2, only this element (m, g1 ) is excluded:
g1 (x) = 2m − x for x ∈ [1, m]
as in (138.1.5). In both cases we deduce (138.1.2), completing the proof.
Remarks:The name of the theorem derives from the fact that the exponents n(3n + 1)/2
are the generalized pentagonal numbers.
The theorem was discovered and proved by Euler around 1750. This was one of the first
results about what are now called theta functions, and was also one of the earliest applications
of the formalism of generating functions.
The above proof is due to F. Franklin, (Comptes Rendus de l’Acad. des Sciences, 92,
1881, pp. 448-450).
652
Chapter 139
It is worth noting that the clause ”every prime is maximal” implies that the maximal length
of a strictly increasing chain of prime ideals is 1, so the Krull dimension of any Dedekind
domain is 1. In particular, the affine ring of an algebraic set is a Dedekind domain if and
only if the set is normal, irreducible, and 1-dimensional.
653
Here µ(K) is the finite cyclic group of the roots of unity in O∗K , r is the number of real embeddings
K → R, and 2s is the number of non-real complex embeddings K → C (they occur in
complex conjugate pairs, so s is an integer).
In general, let K be any field. Write K for a separable closure of K, and GK for the
absolute Galois group Γ((|Q)K/K) of K. Let A be a (Hausdorff) abelian topological group.
Then an (A-valued) Galois representation for K is a continuous homomorphism
ρ : GK → Aut(A),
where we endow GK with the Krull topology, and where Aut(A) is the group of contin-
uous automorphisms of A, endowed with the compact-open topology. One calls A the
representation space for ρ.
The simplest case is where A = Cn , the group of n × 1 column vectors with complex entries.
Then Aut(Cn ) = GLn (C), and we have what is usually called a complex representation of
degree n. In the same manner, letting A = F n , with F any field (such as R or a finite field
Fq ) we obtain the usual definition of a degree n representation over F .
There is an alternate definition which we should also mention. Write Z[GK ] for the group ring
of G with coefficients in Z. Then a Galois representation for K is simply a continuous
Z[GK ]-module A. In other words, all the information in a representation ρ is preserved in
considering the representation space A as a continuous Z[GK ]-module. The equivalence of
these two definitions is as described in the entry for the group algebra.
654
where G is any profinite group, and H ranges over all normal subgroups of finite index.
Given a Galois representation ρ, let G0 = ker ρ. By the fundamental theorem of infinite Galois theory,
since G0 is a closed normal subgroup of GK , it corresponds to a certain normal subfield
of K. Naturally, this is the fixed field of G0 , and we denote it by K(ρ). (The notation
becomes better justified after we view some examples.) Notice that since ρ is trivial on
G0 = Γ((|Q)K/K(ρ)), it factors through a representation
In the case A = Rn or A = Cn , the so-called ”no small subgroups” argument implies that
the image of GK is finite.
For a first application of definition, we say that ρ is discrete if for all a ∈ A, the stabilizer of
a in GK is open in GK . This is the case when A is given the discrete topology, such as when
A is finite and Hausdorff. The stabilizer of any a ∈ A fixes a finite extension of K, which we
denote by K(a). One has that K(ρ) is the union of all the K(a).
As a second application, suppose that the image ρ(GK ) is Abelian. Then the quotient
GK /G0 is Abelian, so G0 contains the commutator subgroup of GK , which means that K(ρ)
is contained in K ab , the maximal Abelian extension of K. This is the case when ρ is a
character, i.e. a 1-dimensional representation over some (commutative unital) ring,
ρ : GK → GL1 (A) = A× .
Associated to any field K are two basic Galois representations, namely those with represen-
tation spaces A = L and A = L× , for any normal intermediate field K ⊆ L ⊆ K, with the
usual action of the Galois group on them. Both of these representations are discrete. The
additive representation is rather simple if L/K is finite: by the normal basis theorem, it is
merely a permutation representation on the normal basis. Also, if L = K and x ∈ K, then
K(x), the field obtained by adjoining x to K, agrees with the fixed field of the stabilizer of
x in GK . This motivates the notation “K(a)” introduced above.
655
By contrast, in general, L× can become a rather complicated object. To look at just a
piece of the representation L× , assume that L contains the group µm of m-th roots of unity,
where m is prime to the characteristic of K. Then we let A = µm . It is possible to
choose an isomorphism of Abelian groups µm ∼ = Z/m, and it follows that our representa-
tion is ρ : GK → (Z/m) . Now assume that m has the form pn , where p is a prime not
×
equal to the characteristic, and set An = µpn . This gives a sequence of representations
ρn : GK → (Z/pn )× , which are compatible with the natural maps (Z/pn+1 )× → (Z/pn )× .
This compatibility allows us to glue them together into a big representation
ρ : GK → Aut(Tp Gm ) ∼
= Z×
p,
called the p-adic cyclotomic representation of K. This is representation is often not discrete.
The notation Tp Gm will be explained below.
This example may be generalized as follows. Let B be an Abelian algebraic group defined
over K. For each integer n, let Bn = B(K)[pn ] be the set of K-rational points whose order
divides pn . Then we define the p-adic Tate module of B via
Tp B = lim Bn .
←−
n
It acquires a natural Galois action from the ones on the Bn . The two most commonly treated
examples of this are the cases B = Gm (the multiplicative group, giving the cyclotomic
representation above) and B = E, an elliptic curve defined over K.
The last thing which we shall mention about generalities is that to any Galois representation
ρ : GK → Aut(A), one may associate the Galois cohomology groups H n (K, ρ), more com-
monly written H n (K, A), which are defined to be the group cohomology of GK (computed
with continuous cochains) with coefficients in A.
Galois representations play a fundamental role in algebraic number theory, as many objects
and properties related to global fields and local fields may be determined by certain Galois
representations and their properties. We shall describe the local case first, and then the
global case.
Let K be a local field, by which we mean the fraction field of a complete DVR with finite
residue field. We write vK for the normalized valuation, OK for the associated DVR, mK for
the maximal ideal of OK , kK = OK /mK for the residue field, and ` for the characteristic of
kK .
Let L/K be a finite Galois extension, and define vL , OL , mL , and kL accordingly. There
is a natural surjection Γ((|Q)L/K) → Γ((|Q)kL/kK ). We call the kernel of this map the
656
inertia group, and write it I(L/K) = ker (Γ((|Q)L/K) → Γ((|Q)kL /kK )). Further, the p-
Sylow subgroup of I(L/K) is normal, and we call it the wild ramification group, and denote
it by W (L/K). One calls I/W the tame ramification group.
It happens that the formation of these group is compatible with extensions L0 /L/K, in
that we have surjections I(L0 /K) → I(L/K) and W (L0 /K) → W (L/K). This lets us
define WK ⊂ IK ⊂ GK to be the inverse limits of the subgroups W (L/K) ⊆ I(L/K) ⊆
Γ((|Q)L/K), L as usual ranging over all finite Galois extensions of K in K.
Unramified or tamely ramified extensions are usually much easier to study than wildly ram-
ified extensions. In the unramified case, it results from the fact that GK /IK ∼ = Gk(K) ∼
=Z b
is pro-cyclic. Thus an unramified representation is completely determined by the action of
ρ(σ) for a topological generator σ of GK /IK . (Such a σ is often called a Frobenius element.)
Given a finite extension L/K, one defines the inertia degree fL/K = [kL : kK ] and the
ramification degree eL/K = [vL (L× ) : vL (K × )] as usual. Then in the Galois case one may
recover them as fL/K = [Γ((|Q)L/K) : I(L/K)] and eL/K = #I(L/K). The tame inertia
degree, which is the non-p-part of eL/K , is equal to [I(L/K) : W (L/K)], while the wild
inertia degree, which is the p-part of eL/K , is equal to #W (L/K).
One finds that the inertia and ramification properties of L/K may be computed from the
ramification properties of the Galois representation OL .
We now turn to global fields. We shall only treat the number field case. Thus we let K be
a finite extension of Q, and write OK for its ring of integers. For each place v of K, write
Kv for the completion of K with respect to v. When v is a finite place, we write simply v
for its associated normalized valuation, Ov for OKv , mv for mKv , kv for kKv , and `(v) for the
characteristic of kv .
657
For a Galois representation ρ : GK → Aut(A) and a place v, it is profitable to consider
the restricted representation ρv = ρ|Gv . One calls ρ a global representation, and ρv a local
representation. We say that ρ is ramified or tamely ramified (or not) at v if ρv is (or
isn’t). The Tchebotarev density theorem implies that the corresponding Frobenius elements
σv ∈ Gv are dense in GK , so that the union of the Gv is dense in GK . Therefore, it is
reasonable to try to reduce questions about ρ to questions about all the ρv independently.
This is a manifestation of Hasse’s local-to-global principle.
Given a global Galois representation with representation space Znp which is unramified at
all but finitely many places v, it is a goal of number theory to prove that it arises naturally
in arithmetic geometry (namely, as a subrepresentation of an étale cohomology group of a
motive), and also to prove that it arises from an automorphic form. This can only be shown
in certain special cases.
It is easy to see that the set S of all Gaussian integers is a subring of C; specifically, S is the
smallest subring containing {1, i}, whence S = G.
There are four units (i.e. invertible elements) in the ring G, namely ±1 and ±i. Up to
multiplication by units, the primes in G are
• the element 1 + i.
Using the ring of Gaussian integers, it is not hard to show, for example, that the Diophantine equation
x2 + 1 = y 3 has no solutions (x, y) ∈ Z × Z except (0, 1).
658
139.6 algebraic conjugates
The notion of algebraic conjugacy is a special case of group conjugacy in the case where the
group in question is the Galois group of the above minimal polynomial, viewed as acting on
the roots of said polynomial.
659
139.10 calculating the splitting of primes
Let K|L be an extension of number fields, with rings of integers OK , OL . Since this extension
is separable, there exists α ∈ K with L(α) = K and by multiplying by a suitable integer, we
may assume that α ∈ OK (we do not require that OL [α] = OK . There is not, in general, an
α ∈ OL with this property). Let f ∈ OL [x] be the minimal polynomial of α.
Now, let p be a prime ideal of L that does not divide ∆(f )∆(OK )−1 , and let f¯ ∈ OL /pOL [x]
be the reduction of f mod p, and let f¯ = f¯1 · · · f¯n be its factorization into irreducible
polynomials. If there are repeated factors, then p splits in K as the product
where fi is any polynomial in OL [x] reducing to f¯i . Note that in this case p is unramified,
since all fi are pairwise coprime mod p
√
For example, let L = Q, K = Q( d) where d is a square-free integer. Then f = x2 − d. For
any prime p, f is irreducible mod p if and only if it has no roots mod p, i.e. d is a quadratic
non-residue mod p. Using quadratic reciprocity, we can obtain a congruence condition mod
4p for which primes split and which do not. In general, this is possible for all fields with
abelian Galois groups, using class field theory.
Furthermore, let K 0 be the splitting field of L. Then G = Gal(K 0 |L) acts on the roots of f ,
giving a map G → Sm , where m = deg f . Given a prime p of OL , the Artin symbol [P, K 0 |L]
for any P lying over p is determined up to conjugacy by p. Its image in Sn is a product of
disjoint cycles of length m1 , . . . , mn where mi = deg fi . This information is useful not just
for prime splitting, but also for the calculation of Galois groups.
Another useful fact is the Frobenius density theorem, which states that every element of G
is [P, K 0 |L] for infinitely many primes P of OK 0 .
For example, let f = x3 + x2 + 2 ∈ Z[x]. This is irreducible mod 3, and thus irreducible.
Galois theory tells us that G = Gal(K 0 |L) is a subgroup of S3 , and so is isomorphic to C3 or
S3 , but it is not obvious which. But if we consider p = 7, f ⇔ (x − 2)(x2 + 3x − 1) (mod 7),
and the quadratic factor is irreducible mod 7. Thus, G ∼ = S3 .
Or let f = x4 + ax2 + b for some integers a, b and is irreducible. For a prime p, consider the
factorization of f . Either it remains irreducible (G contains a 4-cycle), splits as the product
of irreducible quadratics (G contains a cycle of the form (12)(34)) or f¯ has a root. If β is
a root of f , then so is −β, and so assuming p 6= 2, there are at least two roots, and so a
3-cycle is impossible. Thus G ∼ = C4 or D4 .
660
139.11 characterization in terms of prime ideals
Let R be a Dedekind domain and let I be an ideal of R. Then there exists an ideal J in R
such that IJ is principal.
1. associativity: [a]· ([b]· [c]) = [a]· [bc] = [a(bc)] = [abc] = [(ab)c] = [ab]· [c] = ([a]· [b]) · [c]
3. inverses: Consider [b]. Let b be an integer in b. Then b ⊇ (b), so there exists c such
that bc = (b).
Then the ideal class [b] · [c] = [(b)] = [OK ].
661
An integral basis for K over Q is a basis for K over Q.
A ring R is said to be integrally closed (or normal) if it is integrally closed in its fraction field.
In fact this theorem is true for 3rd roots, 4th roots, 5th roots, ..., nth roots etc, but the proof
is somewhat more involved.
662
Chapter 140
Salem number is a real algebraic integer α > 1 whose algebraic conjugates all lie in the
unit disk { z ∈ C |z| 6 1 } with at least one on the unit circle { z ∈ C |z| = 1 }.
Powers of a Salem number αn (n = 1, 2, . . . ) are everywhere dense modulo 1, but are not
uniformly distributed modulo 1.
α10 + α9 − α7 − α6 − α5 − α4 − α3 + α + 1 = 0.
663
Chapter 141
664
REFERENCES
1. Daniel A.Marcus, Number Fields. Springer, New York.
665
Chapter 142
In a similar fashion to this result, the theory of elliptic curves with complex multiplication
provides a classification of abelian extensions of quadratic imaginary number fields:
Theorem 12. Let K be a quadratic imaginary number field with ring of integers OK . Let
E be an elliptic curve with complex multiplication by OK and let j(E) be the j-invariant of
E. Then:
K ab = K(j(E), h(Etorsion ))
Note: The map h : E → C is called a Weber function for E. We can define a Weber
function for the cases j(E) = 0, 1728 so the theorem holds true for those two cases as well.
Assume E : y 2 = x3 + Ax + B, then:
x(P ), if j(E) 6= 0, 1728;
h(P ) = x2 (P ), if j(E) = 1728;
3
x (P ), if j(E) = 0.
666
REFERENCES
1. S. Lang, Algebraic Number Theory, Springer-Verlag, New York.
2. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York.
Examples:
2. The following are the first few class numbers of the cyclotomic fields Q(ζp ), where ζp
is a primitive p-th root of unity:
p Class Number
3 1
5 1
7 1
11 1
13 1
17 1
19 1
23 3
29 8
31 9
37 37
41 121
43 211
47 695
53 4889
59 41241
61 76301
Remarks:
667
• Notice that 37 divides 37, and 59 divides 41241 = 3 · 59 · 233, thus 37, 59 are
irregular primes (see above).
• The class number of the cyclotomic fields grows very quickly with p. For example,
p = 19 is the last cyclotomic field of class number 1.
Let q ∈ Z be a prime greater than 2, let ζq = e2πi/q and write L = Q(ζq ) for the cyclotomic extension.
The ring of integers of L is OL = Z[ζq ]. The discriminant of L/Q is:
DL/Q = ±q q−2
Hence the result holds (and the sign depends on whether q − 1 ⇔ 0, 1 mod 4).
√
Let K = Q( ±q) with the corresponding sign. Thus, by the proposition we have a tower of
fields: L = Q(ζq )
For a prime ideal pZ the decomposition in the quadratic extension K/Q is well-known (see
this entry). The next theorem characterizes the decomposition in the extension L/Q:
668
1. If p = q, qOL = (1 − ζq )q−1 . In other words, the prime q is totally ramified in L.
REFERENCES
1. Daniel A.Marcus, Number Fields. Springer, New York.
A prime p is regular if the class number of the cyclotomic field Q(ζp ) is not divisible by p
(where ζp := e2πi/p denotes a primitive pth root of unity). An irregular prime is a prime that
is not regular.
Regular primes rose to prominence as a result of Ernst Kummer’s work in the 1850’s on
Fermat’s last theorem. Kummer was able to prove Fermat’s Last Theorem in the case where
the exponent is a regular prime, a result that prior to Wiles’s recent work was the only
demonstration of Fermat’s Last Theorem for a large class of exponents. In the course of this
work Kummer also established the following numerical criterion for determining whether a
prime is regular:
• p is regular if and only if none of the numerators of the Bernoulli numbers B0 , B2 , B4 , . . . , Bp−3
is a multiple of p.
Based on this criterion it is possible to give a heuristic argument that the regular primes
have density e−1/2 in the set of all primes [1]. Despite this, there is no known proof that the
set of regular primes is infinite, although it is known that there are infinitely many irregular
primes.
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory,
Springer-Verlag, New York, Second Edition, 1990.
669
Chapter 143
143.1 regulator
{1 , 2 , . . . , r−1 }
be a fundamental system of generators of O× K modulo roots of unity (this is, modulo the
torsion subgroup). Let A be the r × (r − 1) matrix
log k 1 k1 log k 2 k1 . . . log k r−1 k1
log k 1 k2 log k 2 k2 . . . log k r−1 k2
A= .. .. .. ..
. . . .
log k 1 kr log k 2 kr . . . log k r−1 kr
and let Ai be the (r −1) ×(r −1) matrix obtained by deleting the i-th row from A, 1 6 i 6 r.
It can be checked that the determinant of Ai , det Ai , is independent up to sign of the choice
of fundamental system of generators of O×K and is also independent of the choice of i.
670
Definition 5. The regulator of K is defined to be
RegK =| det A1 |
671
Chapter 144
Let K be a number field. There exists a finite extension E of K with the following properties:
2. E is Galois over K.
There is a unique field E satisfying the above five properties, and it is known as the Hilbert
class field of K.
The field E may also be characterized as the maximal abelian unramified extension of K.
672
144.2 class number formula
ζK (s)
1. hK is the class number, the number of elements in the ideal class group of K.
Then:
Theorem 14 (Class Number formula). The Dedekind zeta function of K, ζK (s) converges absolutely
for Re(s) > 1 and extends to a meromorphic function defined for Re(s) > 1 − n1 with only
one simple pole at s = 1. Moreover:
Note: This is the most general “class number formula”. In particular cases, for example
when K is a cyclotomic extension of Q, there are particular and more refined class number
formulas.
144.3 discriminant
144.3.1 Definitions
Let R be any Dedekind domain with field of fractions K. Fix a finite dimensional field
extension L/K and let S denote the integral closure of R in L. For any basis x1 , . . . , xn of
L over K, the determinant
673
whose entries are the trace of xi xj over all pairs i, j, is called the discriminant of the basis
x1 , . . . , xn . The ideal in R generated by all discriminants of the form
∆(x1 , . . . , xn ), xi ∈ S
is called the discriminant ideal of S over R, and denoted ∆(S/R).
In the special case where S is a free R–module, the discriminant ideal ∆(S/R) is always a
principal ideal, generated by any discriminant of the form ∆(x1 , . . . , xn ) where x1 , . . . , xn is
a basis for S as an R–module. In particular, this situation holds whenever K and L are
number fields.
144.3.2 Properties
The discriminant is so named because it allows one to determine which ideals of R are
ramified in S. Specifically, the prime ideals of R that ramify in S are precisely the ones that
contain the discriminant ideal ∆(S/R). In the case R = Z, Minkowski’s theorem states that
any ring of integers S of a number field larger than Q has discriminant strictly smaller than
Z itself, and this fact combined with the previous result shows that any number field K 6= Q
admits at least one ramified prime over Q.
Let K be a number field. Let a and b be ideals in OK (the ring of algebraic integers of K).
Define a relation ∼ on the ideals of OK in the following way: write a ∼ b if there exist
nonzero elements α and β of OK such that (α)a = (β)b.
The relation ∼ is an equivalence relation, and the equivalence classes under ∼ are known as
ideal classes.
Note that the set of ideals of any ring R forms an Abelian semigroup with the product of ideals
as the semigroup operation. By replacing ideals by ideal classes, it is possible to define a
group on the ideal classes of OK in the following way.
Let a, b be ideals of OK . Denote the ideal classes of which a and b are representatives by [a]
and [b] respectively. Then define · by
[a] · [b] = [ab]
674
Note that the ideal class group of K is simply the quotient group of the ideal group of K by
the subgroup of principal fractional ideals.
Let m be a modulus for a number field K. The ray class group of K mod m is the group
Im/Km,1, where
• Im is the subgroup of the ideal group of K generated by all prime ideals which do not
occur in the factorization of m.
• Km,1 is the subgroup of Im consisting of all principal ideals in the ring of integers of K
having the form (α) where α is multiplicatively congruent to 1 mod m.
675
Chapter 145
Let f ∈ F [x] be a polynomial over a field F , and let K be its splitting field. Then K is a
radical extension if and only if the Galois group Gal(K/F ) is a solvable group.
676
Chapter 146
Let L/K be a finite Galois extension with Galois group G = Gal(L/K). Then the first
Galois cohomology group H 1 (G, L∗ ) is 0.
A corollary (and the actual result that Hilbert called his Theorem 90) is that, if G is cyclic
with generator σ, then x ∈ L has norm 1 if and only if
x = y/σ(y)
for some y ∈ L.
677
Chapter 147
Let L/K be a Galois extension of number fields, with rings of integers OL and OK . For any
finite prime P ⊂ L lying over a prime p ∈ K, let D(P) denote the decomposition group
of P, let T (P) denote the inertia group of P, and let l := OL /P and k := OK /p be the
residue fields. The exact sequence
If we add the additional assumption that p is unramified, then T (P) is the trivial group,
and [L/K, P] in this situation is an element of D(P) ⊂ Γ((|Q)L/K), called the Frobenius
automorphism of P.
If, furthermore, L/K is an Abelian extension (that is, Γ((|Q)L/K) is an abelian group), then
[L/K, P] = [L/K, P0 ] for any other prime P0 ⊂ L lying over p. In this case, the Frobenius
automorphism [L/K, P] is denoted (L/K, p); the change in notation from P to p reflects the
fact that the automorphism is determined by p ∈ K independent of which prime P of L
above it is chosen for use in the above construction.
Definition 4. Let S be a finite set of primes of K, containing all the primes that ramify in
S
L. Let IK denote the subgroup of the group IK of fractional ideals of K which is generated
by all the primes in K that are not in S. The Artin map
S
φL/K : IK −→ Γ((|Q)L/K)
S
is the map given by φL/K (p) := (L/K, p) for all primes p ∈
/ S, extended linearly to IK .
678
Version: 5 Owner: djao Author(s): djao
Let L/K be any finite Galois extension of number fields with Galois group G. For any
conjugacy class C ⊂ G, the subset of prime ideals p ⊂ K which are unramified in L and
satisfy the property
Note that the conjugacy class of [L/K, P] is independent of the choice of prime P lying
over p, since any two such choices of primes are related by a Galois automorphism and their
corresponding Artin symbols are conjugate by this same automorphism.
147.3 modulus
where
• The product is taken over all finite primes and infinite primes of K
679
and its infinite part Y
pnp ,
p real
with the finite part equal to some ideal in the ring of integers OK of K, and the infinite part
equal to the product of some subcollection of the real primes of K.
Let p be any real prime of a number field K, and write i : K −→ R for the corresponding
real embedding of K. We say two elements α, β ∈ K are multiplicatively congruent mod p if
the real numbers i(α) and i(β) are either both positive or both negative.
Now let p be a finite prime of K, and write (OK )p for the localization of the ring of integers
OK of K at p. For any natural number n, we say α and β are multiplicatively congruent mod
pn if they are members of the same coset of the subgroup 1 + pn (OK )p of the multiplicative
group K × of K.
then we say α and β are multiplicatively congruent mod m if they are multiplicatively con-
gruent mod pnp for every prime p appearing in the factorization of m.
α ⇔∗ β (mod m).
Proposition 5. Let L/K be a finite Abelian extension of number fields, and let OK be the
ring of integers of K. There exists an integral ideal C ⊂ OK , divisible by precisely the
prime ideals of K that ramify in L, such that
680
Definition 6. The conductor of a finite abelian extension L/K is the largest ideal CL/K ⊂
OK satisfying the above properties.
Note that there is a “largest ideal” with this condition because if proposition 1 is true for
C1 , C2 then it is also true for C1 + C2 .
Definition 7. Let I be an integral ideal of K. A ray class field of K (modulo I) is a finite
abelian extension KI/K with the property that for any other finite abelian extension L/K
with conductor CL/K ,
CL/K | I ⇒ L ⊂ KI
Note: It can be proved that there is a unique ray class field with a given conductor. In words,
the ray class field is the biggest abelian extension of K with a given conductor (although
the conductor of KI does not necessarily equal I !, see example 2).
Examples:
2.
Q(i)(2) = Q(i)
so the conductor of Q(i)(2) /Q is (1).
3. K(1) , the ray class field of conductor (1), is the maximal abelian extension of K which
is unramified everywhere. It is, in fact, the Hilbert class field of K.
REFERENCES
1. Artin/Tate, Class Field Theory. W.A.Benjamin Inc., New York.
681
Chapter 148
148.1 adle
Let K be a number field. For each finite prime v of K, let ov denote the valuation ring of the
completion Kv of K at v. The adèle group AK of K is defined to be the restricted direct product
of the collection of locally compact additive groups {Kv } over all primes v of K (both finite
primes and infinite primes), with respect to the collection of compact open subgroups {ov }
defined for all finite primes v.
The set AK inherits addition and multiplication operations (defined pointwise) which make
it into a topological ring. The original field K embeds as a ring into AK via the map
Y
x 7→ xv .
v
It turns out that the image of K in AK is a discrete set and the quotient group AK /K is a
compact space in the quotient topology.
148.2 idle
Let K be a number field. For each finite prime v of K, let ov be the valuation ring of the
completion Kv of K at v, and let Uv be the group of units in ov . Then each group Uv is a
compact open subgroup of the group of units Kv∗ of Kv . The idèle group IK of K is defined
682
to be the restricted direct product of the multiplicative groups {Kv∗ } with respect to the
compact open subgroups {Uv }, taken over all finite primes and infinite primes v of K.
Warning: The group IK is a multiplicative subgroup of the ring of adèles AK , but the
topology on IK is different from the subspace topology that IK would have as a subset of
AK .
Let {Gv }v∈V be a collection of locally compact topological groups. For all but finitely many
v ∈ V , let Hv ⊂ Gv be a compact open subgroup of Gv . The restricted direct product of the
collection {Gv } with respect to the collection {Hv } is the subgroup
( )
Y
G := (gv )v∈V ∈ Gv gv ∈ Hv for all but finitely many v ∈ V
v∈V
Q
of the direct product v∈V Gv .
We define a topology on G as follows. For every finite subset S ⊂ V that contains all the
elements v for which Hv is undefined, form the topological group
Y Y
GS := Gv × Hv
v∈S v∈S
/
consisting of the direct product of the Gv ’s, for v ∈ S, and the Hv ’s, for v ∈/ S. The
topological group GS is a subset of G for each such S, and we take for a topology on G the
weakest topology such that the GS are open subsets of G, with the subspace topology on
each GS equal to the topology that GS already has in its own right.
683
Chapter 149
11R99 – Miscellaneous
Let || be a non archemidean valuation on K. Define the set V := {x : |x| 6 1}. We can see
that V is closed under addition as || is an ultra metric, and infact V is an additive group.
The other valuation axioms ensure that V is a ring. We call V the valuation ring of K with
respect to ||. Note that the field of fractions of V is K.
Let µ := {x : |x| < 1}. It is easy to show that this is a maximal ideal of V . Let R := V /µ
be called the residue field.
The map res: V → V /µ given by x 7→ x + µ is called the residue map. We extend the
definition of the residue map to sequences of elements from V , and hence to V [X] so that if
P ( P (
f (X) ∈ V [X] is given by i6n ai X i then resf ) ∈ R[X] is given by i6n resa i)X i .
(
Hensel Property: Let f (x) ∈ V [x]. Suppose resf )(x) has a simple root e ∈ k. Then f (x)
has a root e0 ∈ V and res(e0 ) = e.
Any valued field satisfying hensels property shall be called henselian. The completion of
a non archemidean valued field K with respect to the valuation (cf. constructing the reals
from the rationals as the completion with respect to the standard metric) is a henselian field.
Every non archemedian valued field K has a unique (up to isomorphism) smallest henselian
field K h containing it. We call K h the henselisation of K.
684
149.2 valuation
3. |x + y| 6 |x| + |y|
In the case where K is a number field, primes as defined above generalize the notion of
prime ideals in the following way. Let p ⊂ K be a nonzero prime ideal 1 , considered as a
fractional ideal. For every nonzero element x ∈ K, let r be the unique integer such that
x ∈ pr but x ∈/ pr+1 . Define (
1/N(p)r x 6= 0,
|x|p :=
0 x = 0,
where N(p) denotes the absolute norm of p. Then | · |p is a non–archimedean valuation on
K, and furthermore every non–archimedean valuation on K is equivalent to | · |p for some
prime ideal p. Hence, the prime ideals of K correspond bijectively with the finite primes of
K, and it is in this sense that the notion of primes as valuations generalizes that of a prime
ideal.
As for the archimedean valuations, when K is a number field every embedding of K into R
or C yields a valuation of K by way of the standard absolute value on R or C, and one can
show that every archimedean valuation of K is equivalent to one arising in this way. Thus
the infinite primes of K correspond to embeddings of K into R or C, and we call such a
prime real or complex according to whether the valuations comprising it arise from real or
complex embeddings.
1
By ”prime ideal” we mean ”prime fractional ideal of K” or equivalently ”prime ideal of the
ring of integers of K”. We do not mean literally a prime ideal of the ring K, which would be the zero ideal.
685
Chapter 150
Let A be a noetherian integrally closed integral domain with field of fractions K. Let L be
a Galois extension of K and denote by B the integral closure of A in L. Then, for any
prime ideal p ⊂ A, the Galois group G := Γ((|Q)L/K) acts transitively on the set of all
prime ideals P ⊂ B containing p. If we fix a particular prime ideal P ⊂ B lying over p,
then the stabilizer of P under this group action is a subgroup of G, called the decomposition
group at P and denoted D(P/p). In other words,
If P0 ⊂ B is another prime ideal of B lying over p, then the decomposition groups D(P/p)
and D(P0/p) are conjugate in G via any Galois automorphism mapping P to P0 .
Write l for the residue field B/P and k for the residue field A/p. Assume that the extension
l/k is separable (if it is not, then this development is still possible, but considerably more
complicated; see [1, p. 20]). Any element σ ∈ D(P/p), by definition, fixes P and hence
descends to a well defined automorphism of the field l. Since σ also fixes A by virtue of
being in G, it induces an automorphism of the extension l/k fixing k. We therefore have a
group homomorphism
D(P/p) −→ Γ((|Q)l/k),
686
and the kernel of this homorphism is called the inertia group of P, and written T (P/p). It
turns out that this homomorphism is actually surjective, so there is an exact sequence
The decomposition group is so named because it can be used to decompose the field extension
L/K into a series of intermediate extensions each of which has very simple factorization
behavior at p. If we let LD denote the fixed field of D(P/p) and LT the fixed field of T (P/p),
then the exact sequence (150.1.1) corresponds under Galois theory to the lattice of fields
L
e
LT
f
LD
g
K
If we write e, f, g for the degrees of these intermediate extensions as in the diagram, then we
have the following remarkable series of equalities:
1. The number e equals the ramification index e(P/p) of P over p, which is independent
of the choice of prime ideal P lying over p since L/K is Galois.
2. The number f equals the inertial degree f (P/p) of P over p, which is also independent
of the choice of prime ideal P since L/K is Galois.
3. The number g is equal to the number of prime ideals P of B that lie over p ⊂ A.
Informally, this decomposition of the extension says that the extension LD /K encapsulates
all of the factorization of p into distinct primes, while the extension LT /LD is the source
of all the inertial degree in P over p and the extension L/LT is responsible for all of the
ramification that occurs over p.
687
150.1.4 Localization
The decomposition groups and inertia groups of P behave well under localization. That
is, the decomposition and inertia groups of PBP ⊂ BP over the prime ideal pAp in the
localization Ap of A are identical to the ones obtained using A and B themselves. In fact,
the same holds true even in the completions of the local rings Ap and BP at p and P.
REFERENCES
1. J.P. Serre, Local Fields, Springer–Verlag, 1979 (GTM 67)
Here we follow the notation of the entry on the decomposition group. See also this entry.
Example 1
√
Let K = Q( −7); then Gal(K/Q) = {Id, σ} ∼ = Z/2Z, where σ is the complex conjugation
map. Let OK be the ring of integers of K. In this case:
√
1 + −7
OK = Z
2
The discriminant of this field is DK/Q = −7. We look at the decomposition in prime ideals
of some prime ideals in Z:
688
2. The primes (5), (13) are inert, i.e. they are prime ideals in OK . Thus e = 1 = g, f = 2.
Obviously the conjugation map σ fixes the ideals (5), (13), so
Example 2
2πi
Let ζ7 = e 7 , i.e. a 7th -root of unity, and let L = Q(ζ7 ). This is a cyclotomic extension of
Q with Galois group
Gal(L/Q) ∼ = (Z/7Z) ∼
×
= Z/6Z
Moreover
Gal(L/Q) = {σa : L → L | σa (ζ7 ) = ζ7a , a ∈ (Z/7Z)× }
√ Q(ζ7 + ζ76 )
Q( −7)
The discriminant of the extension L/Q is DL/Q = −75 . Let OL denote the ring of integers of
L, thus OL = Z[ζ7 ]. We use the results of this entry to find the decomposition of the primes
2, 5, 7, 13, 29:
L = Q(ζ7 ) (1 − ζ7 )6 P · P0 (5) Q1 · Q2 · Q3
3
√ √ √ √
K = Q( −7) ( −7)2 2, 1+ 2 −7 2, 1− 2 −7 (5) (13)
2
689
1. The prime ideal 7Z is totally ramified in L, and the only prime ideal that ramifies:
7OL = (1 − ζ7 )6 = T6
Thus
e(T/(7)) = 6, f (T/(7)) = g(T/(7)) = 1
Note that, by the properties of the fixed fields of decomposition and inertia groups, we
must have LT (T/(7)) = Q = LD(T/(7)) , thus, by Galois theory,
2. The ideal 2Z factors in K as above, 2OK = P · P0 , and each of the prime ideals P, P0
remains inert from K to L, i.e. POL = P, a prime ideal of L. Note also that the order
of 2 mod 7 is 3, and since g is at least 2, 2 · 3 = 6, so e must equal 1 (recall that
ef g = n):
e(P/(2)) = 1, f (P/(2)) = 3, g(P/(2)) = 2
Since e = 1, LT (P/(2)) = L, and [L : LD(P/(2)) ] = 3, so
3. The ideal (5) is inert, 5OL = S is prime and the order of 5 modulo 7 is 6. Thus:
29OL = R1 · R2 · R3 · R0 1 · R0 2 · R0 3
Also 29 ⇔ 1 mod 7, so f = 1,
690
150.3 inertial degree
A particular case of special importance in number theory is when L/K is a field extension
and ι : OK −→ OL is the inclusion map of the ring of integers. In this case, the domain
OK /p is a field, so dimOK /p OL /P is guaranteed to exist, and the inertial degree of P over
OK is denoted f (P/p). We have the formula
X
e(P/p)f (P/p) = [L : K],
P|p
where e(P/p) is the ramification index of P over p and the sum is taken over all prime ideals
P of OL dividing pOL .
Example:
Let ι : Z −→ Z[i] be the inclusion of the integers into the Gaussian integers. A prime p in Z
may or may not factor in Z[i]; if it does factor, then it must factor as p = (x + yi)(x − yi) for
some integers x, y. Thus a prime p factors into two primes if it equals x2 + y 2, and remains
prime in Z[i] otherwise. There are then three categories of primes in Z[i]:
1. The prime 2 factors as (1 + i)(1 − i), and the principal ideals generated by (1 + i) and
(1 − i) are equal in Z[i], so the ramification index of (1 + i) over Z is two. The ring
Z[i]/(1 + i) is isomorphic to Z/2, so the inertial degree f ((1 + i)/(2)) is one.
2. For primes p ⇔ 1 mod 4, the prime p ∈ Z factors into the product of the two primes
(x + yi)(x + yi), with ramification index and inertial degree one.
3. For primes p ⇔ 3 (mod 4), the prime p remains prime in Z[i] and Z[i]/(p) is a two
dimensional field extension of Z/p, so the inertial degree is two and the ramification
index is one.
In all cases, the sum of the products of the inertial degree and ramification index is equal to
2, which is the dimension of the corresponding extension Q(i)/Q of number fields.
For any extension ι : A −→ B of Dedekind domains, the inertial degree of the prime P ⊂ B
over the prime p := ι−1 (P) ⊂ A is equal to the inertial degree of PBP over pAp in the
localizations at P and p. Moreover, the same is true even if we pass to completions of the
local rings BP and Ap at P and p. The preservation of inertial degree and ramification indices
691
with respect to localization is one of the reasons why the technique of localization is a useful
tool in the study of such domains.
As in the case of ramification indices, it is possible to define the notion of inertial degree
in the more general setting of locally ringed spaces. However, the generalizations of inertial
degree are not as widely used because in algebraic geometry one usually works with a fixed
base field, which makes all the residue fields at the points equal to the same field.
for some prime ideals Pi ⊂ OL and exponents ei ∈ N. The natural number ei is called the
ramification index of Pi over p. It is often denoted e(Pi /p). If ei > 1 for any i, then we say
the ideal p ramifies in L.
T
Likewise, if P is a nonzero prime ideal in OL , and p := P OK , then we say P ramifies over
K if the ramification index e(P/p) of P in the factorization of the ideal pOL ⊂ OL is greater
than 1. That is, a prime p in OK ramifies in L if at least one prime P dividing pOL ramifies
over K. If L/K is a Galois extension, then the ramification indices of all the primes dividing
pOL are equal, since the Galois group is transitive on this set of primes.
An astute reader may notice that this formulation of ramification index does not require
that L and K be number fields, or even that they play any role at all. We take advantage
of this fact here to give a second, more general definition.
692
Definition 6 (Second definition). Let ι : A −→ B be any ring homomorphism. Suppose
P ⊂ B is a prime ideal such that the localization BP of B at P is a discrete valuation ring.
Let p be the prime ideal ι−1 (P) ⊂ A, so that ι induces a local homomorphism ιP : Ap −→ BP.
Then the ramification index e(P/p) is defined to be the unique natural number such that
or ∞ if ι(p)BP = (0).
The reader who is not interested in local rings may assume that A and B are unique factorization domains,
in which case e(P/p) is the exponent of P in the factorization of the ideal ι(p)B, just as in our
first definition (but without the requirement that the rings A and B originate from number
fields).
There is of course much more that can be said about ramification indices even in this purely
algebraic setting, but we limit ourselves to the following remarks:
1. Suppose A and B are themselves discrete valuation rings, with respective maximal ideals
p and P. Let  := lim A/pn and B̂ := lim B/Pn be the completions of A and B with
←− ←−
respect to p and P. Then
e(P/p) = e(PB̂/pÂ). (150.4.1)
In other words, the ramification index of P over p in the A–algebra B equals the
ramification index in the completions of A and B with respect to p and P.
2. Suppose A and B are Dedekind domains, with respective fraction fields K and L. If
B equals the integral closure of A in L, then
X
e(P/p)f (P/p) 6 [L : K], (150.4.2)
P|p
where P ranges over all prime ideals in B that divide pB, and f (P/p) := dimA/p(B/P)
is the inertial degree of P over p. Equality holds in Equation (150.4.2) whenever B is
finitely generated as an A–module.
The word “ramify” in English means “to divide into two or more branches,” and we will
show in this section that the mathematical term lives up to its common English meaning.
693
Figure 150.1: The function f (y) = y 2 near y = 0.
There is a finite set of points p ∈ C2 for which the inverse image f −1 (p) does not have size
n, and we call these points the branch points or ramification points of f . If P ∈ C1 with
f (P ) = p, then the ramification index e(P/p) of f at P is the ramification index obtained
algebraically from Definition 6 by taking
• A = k[C2 ]p , the local ring consisting of all rational functions in the function field k(C2 )
which are regular at p.
• B = k[C1 ]P , the local ring consisting of all rational functions in the function field k(C1 )
which are regular at P .
• p = mp , the maximal ideal in A consisting of all functions which vanish at p.
• P = mP , the maximal ideal in B consisting of all functions which vanish at P .
• ι = fp∗ : k[C2 ]p ,→ k[C1 ]P , the map on the function fields induced by the morphism f .
Example 1. The following picture may be worth a thousand words. Let k = C and C1 =
C2 = C = A1C . Take the map f : C −→ C given by f (y) = y 2 . Then f is plainly a map of
degree 2, and every point in C2 except for 0 has two preimages in C1 . The point 0 is thus a
ramification point of f of index 2, and near 0 we have the following graph of f .
Note that we have only drawn the real locus of f because that is all that can fit into two
dimensions. We see from the figure that a typical point on C2 such as the point x = 1 has
two points in C1 which map to it, but that the point x = 0 has only one corresponding point
of C1 which “branches” or “ramifies” into two distinct points of C1 whenever one moves
away from 0.
The relationship between Definition 6 and Definition 7 is easiest to explain in the case where
f is a map between affine varieties. When C1 and C2 are affine, then their coordinate rings
k[C1 ] and k[C2 ] are Dedekind domains, and the points of the curve C1 (respectively, C2 )
correspond naturally with the maximal ideals of the ring k[C1 ] (respectively, k[C2 ]). The
ramification points of the curve C1 are then exactly the points of C1 which correspond
to maximal ideals of k[C1 ] that ramify in the algebraic sense, with respect to the map
f ∗ : k[C2 ] −→ k[C1 ] of coordinate rings.
694
Equation (150.4.2) in this case says
X
e(P/p) = n,
P ∈f −1 (p)
and we see that the well known formula (150.4.2) in number theory is simply the alge-
braic analogue of the geometric fact that the number of points in the fiber of f , counting
multiplicities, is always n.
Example 2. Let f : C −→ C be given by f (y) = y 2 as in Example 1. Since C2 is just the
affine line, the coordinate ring C[C2 ] is equal to C[X], the polynomial ring in one variable
over C. Likewise, C[C1 ] = C[Y ], and the induced map f ∗ : C[X] −→ C[Y ] is naturally given
by f ∗ (X) = Y 2 . We may accordingly identify the coordinate ring C[C2 ] with the subring
C[X 2 ] of C[X] = C[C1 ].
Now, the ring C[X] is a principal ideal domain, and the maximal ideals in C[X] are exactly
the principal ideals of the form (X − a) for any a ∈ C. Hence the nonzero prime ideals in
C[X 2 ] are of the form (X 2 − a), and these factor in C[X] as
√ √
(X 2 − a) = (X − a)(X + a) ⊂ C[X].
√ √
Note that the two prime ideals (X − a) and (X + a) of C[X] are equal only when a = 0,
so we see that the ideal (X 2 − a) in C[X 2 ], corresponding to the point a ∈ C2 , ramifies in C1
exactly when a = 0. We have therefore recovered our previous geometric characterization of
the ramified points of f , solely in terms of the algebraic factorizations of ideals in C[X].
In the case where f is a map between projective varieties, Definition 6 does not directly
apply to the coordinate rings of C1 and C2 , but only to those of open covers of C1 and C2
by affine varieties. Thus we do have an instance of yet another new phenomenon here, and
rather than keep the reader in suspense we jump straight to the final, most general definition
of ramification that we will give.
Definition 8 (Final form). Let f : (X, OX ) −→ (Y, OY ) be a morphism of locally ringed spaces.
Let p ∈ X and suppose that the stalk (OX )p is a discrete valuation ring. Write φp :
(OY )f (p) −→ (OX )p for the induced map of f on stalks at p. Then the ramification in-
dex of p over Y is the unique natural number e, if it exists (or ∞ if it does not exist), such
that
φp (mf (p) )(OX )p = mep ,
where mp and mf (p) are the respective maximal ideals of (OX )p and (OY )f (p) . We say p is
ramified in Y if e > 1.
695
Example 4. For any morphism of varieties f : C1 −→ C2 , there is an induced morphism
f # on the structure sheaves of C1 and C2 , which are locally ringed spaces. If C1 and C2
are curves, then the stalks are one dimensional regular local rings and therefore discrete
valuation rings, so in this way we recover the algebraic geometric definition (Definition 7)
from the sheaf definition (Definition 159.4.3).
Ramification points or branch points in complex geometry are merely a special case of the
high–flown terminology of Definition 159.4.3. However, they are important enough to merit
a separate mention here.
Of course, the analytic notion of ramification given in Definition 9 can be couched in terms
of locally ringed spaces as well. Any Riemann surface together with its sheaf of holomorphic
functions is a locally ringed space. Furthermore the stalk at any point is always a discrete
valuation ring, because germs of holomorphic functions have Taylor expansions making the
stalk isomorphic to the power series ring C[[z]]. We can therefore apply Definition 159.4.3 to
any holomorphic map of Riemann surfaces, and it is not surprising that this process yields
the same results as Definition 9.
696
the “completion” of the algebraic structure, and in this sense the algebraic–analytic corre-
spondence between the ramification points may be regarded as the geometric version of the
equality (150.4.1) in number theory.
REFERENCES
1. Robin Hartshorne, Algebraic Geometry, Springer–Verlag, 1977 (GTM 52).
2. Gerald Janusz, Algebraic Number Fields, Second Edition, American Mathematical Society, 1996
(GSM 7).
3. Jürgen Jost, Compact Riemann Surfaces, Springer–Verlag, 1997.
4. Dino Lorenzini, An Invitation to Arithmetic Geometry, American Mathematical Society, 1996
(GSM 9).
5. Jean–Pierre Serre, Local Fields, Springer–Verlag, 1979 (GTM 67).
6. Jean–Pierre Serre, “Géométrie algébraique et géométrie analytique,” Ann. de L’Inst. Fourier 6
pp. 1–42, 1955–56.
7. Joseph Silverman, The Arithmetic of Elliptic Curves, Springer–Verlag, 1986 (GTM 106).
Let K be a number field and let ν be a discrete valuation on K (this might be, for example,
the valuation attached to a prime ideal P of K).
Oν = {k ∈ Kν | ν(k) > 0}
M = {k ∈ Kν | ν(k) > 0}
kν = Oν /M
GK/K = Gal(K/K)
697
GKν /Kν = Gal(Kν /Kν )
Gkν /kν = Gal(kν /kν )
where K, Kν , kν are separable algebraic closures of the corresponding field. We also define
notation for the inertia group of GKν /Kν
Iν ⊆ GKν /Kν
Definition 8. Let S be a set and suppose there is a group action of Gal(Kν /Kν ) on S. We
say that S is unramified at ν, or the action of GKν /Kν on S is unramified at ν, if the action
of Iν on S is trivial, i.e.
σ(s) = s ∀σ ∈ Iν , ∀s ∈ S
Remark: By Galois theory we know that, Kνnr , the fixed field of Iν , the inertia subgroup,
is the maximal unramified extension of Kν , so
Iν ∼
= Gal(Kν /Kνnr )
698
Chapter 151
Let K be any local field. For any two nonzero elements a, b ∈ K × , we define:
(
+1 if z 2 = ax2 + by 2 has a nonzero solution (x, y, z) 6= (0, 0, 0) in K 3 ,
(a, b) :=
−1 otherwise.
699
Chapter 152
11S99 – Miscellaneous
For any prime p, the p–adic integers is the ring obtained by taking the completion of the
ring Z with respect to the metric induced by the valuation
1
|x| := , x ∈ Z, (152.1.1)
pνp (x)
where νp (x) denotes the largest integer e such that pe divides x. The ring of p–adic integers
is usually denoted by Zp , and its fraction field by Qp .
The ring Zp of p–adic integers can also be constructed by taking the inverse limit
Zp := lim Z/pn Z
←−
over the inverse system · · · → Z/p2 Z → Z/pZ → 0 consisting of the rings Z/pn Z, for all
n > 0, with the projection maps defined to be the unique maps such that the diagram
Z
Z/pn+1 Z Z/pn Z
commutes. An algebraic and topological isomorphism between the two constructions is
obtained by taking the coordinatewise projection map Z → lim Z/pn Z, extended to the
←−
completion of Z under the p–adic metric.
700
This alternate characterization shows that Zp is compact, since it is a closed subspace of the
space Y
Z/pn Z
n>0
which is an infinite product of finite topological spaces and hence compact under the product topology.
152.1.3 Generalizations
In the special case where K is a number field, the p–adic ring Kp is always a finite extension
of Qp whenever p is a finite prime, and is always equal to either R or C whenever p is an
infinite prime.
A local field is a topological field which is Hausdorff and locally compact as a topological space.
In fact, this list is complete—every local field is isomorphic as a topological field to one of
the above fields.
701
152.2.1 Acknowledgements
This document is dedicated to those who made it all the way through Serre’s book [1] before
realizing that nowhere within the book is there a definition of the term ”local field.”
REFERENCES
1. Jean–Pierre Serre, Local Fields, Springer–Verlag, 1979 (GTM 67).
702
Chapter 153
11Y05 – Factorization
Say, for example, that you have a big number n and you want to know the factors of n. Let’s
use 16843009. And, say, for example, that we know that n is a not a prime number. In this
case, I know it isn’t because I multiplied two prime numbers together to make n. (For the
crypto weenies out there, you know that there a lot of numbers lying around which were
made by multiplying two prime numbers together. And, you probably wouldn’t mind finding
the factors of some of them.) In cases where you don’t know, a priori, that the number is
composite, there are a variety of methods to test for compositeness.
Let’s assume that n has a factor d. Since we know n is composite, we know that there must
be one. We just don’t know what its value happens to be. But, there are some things that
we do know about d. First of all, d is smaller than n. In fact, there are at least some d which
are no bigger than the square root of n.
So, how does this help? If you start picking numbers at random (keeping your numbers
greater or equal to zero and strictly less than n), then the only time you will get a ⇔
b (mod#1) is when a and b are identical. However, since d is smaller than n, there is a good
chance that a ⇔ b (mod#1) sometimes when a 6= b.
The amazing thing here is that through all of this, we just knew there had to be some divisor
703
of n. We were able to use properties of that divisor to our advantage before we even knew
what the divisor was!
This is at the heart of Pollard’s rho method. Pick a random number a. Pick another random
number b. See if the greatest common divisor of (a − b) and n is greater than one. If not,
pick another random number c. Now, check the greatest common divisor of (c − b) and n.
If that is not greater than one, check the greatest common divisor of (c − a) and n. If that
doesn’t work, pick another random number d. Check (d − c), (d − b), and (d − a). Continue
in this way until you find a factor.
As you can see from the above paragraph, this could get quite cumbersome quite quickly. By
the k-th iteration, you will have to do (k − 1) greatest common divisor checks. Fortunately,
there is way around that. By structuring the way in which you pick “random” numbers, you
can avoid this buildup.
Let’s say we have some polynomial f (x) that we can use to pick “random” numbers. Because
we’re only concerned with numbers from zero up to (but not including) n, we will take all of
the values of f (x) modulo n. We start with some x1 . We then pick our “random” numbers
by xk+1 = (f (xk ) (mod#1)).
Now, say for example we get to some point k where xk ⇔ xj (mod#1) with k < j. Then,
because of the way that modulo arithmetic works, f (xk ) will be congruent to f (xj ) modulo
d. So, once we hit upon xk and xj , then each element in the sequence starting with xk will be
congruent modulo d to the corresponding element in the sequence starting at xj . Thus, once
the sequence gets to xk it has looped back upon itself to match up with xj (when considering
them modulo d).
This looping is what gives the rho method its name. If you go back through (once you
determine d) and look at the sequence of random numbers that you used (looking at them
modulo d), you will see that they start off just going along by themselves for a bit. Then,
they start to come back upon themselves. They don’t typically loop the whole way back to
the first number of your sequence. So, they have a bit of a tail and a loop—just like the
greek letter rho (ρ).
Before we see why that looping helps, we will first speak to why it has to happen. When
we consider a number modulo d, we are only considering the numbers greater than or equal
to zero and strictly less than d. This is a very finite set of numbers. Your random sequence
cannot possibly go on for more than d numbers without having some number repeat modulo
d. And, if the function f (x) is well-chosen, you can probably loop back a great deal sooner.
The looping helps because it means that we can get away without accumulating the number
of greatest common divisor steps we need to perform with each new random number. In
fact, it makes it so that we only need to do one greatest common divisor check for every
second random number that we pick.
Now, why is that? Let’s assume that the loop is of length t and starts at the j-th random
number. Say that we are on the k-th element of our random sequence. Furthermore, say
704
that k is greater than or equal to j and t divides k. Because k is greater than j we know it
is inside the looping part of the ρ. We also know that if t divides k, then t also divides 2k.
What this means is that x2k and xk will be congruent modulo d because they correspond
to the same point on the loop. Because they are congruent modulo d, their difference is a
multiple of d. So, if we check the greatest common divisor of (xk −xk/2 ) with n every time we
get to an even k, we will find some factor of n without having to do k − 1 greatest common
divisor calculations every time we come up with a new random number. Instead, we only
have to do one greatest common divisor calculation for every second random number.
The only open question is what to use for a polynomial f (x) to get some random numbers
which don’t have too many choices modulo d. Since we don’t usually know much about d,
we really can’t tailor the polynomial too much. A typical choice of polynomial is
f (x) = x2 + a
where a is some constant which isn’t congruent to 0 or −2 modulo n. If you don’t place
those restrictions on a, then you will end up degenerating into the sequence {1, 1, 1, 1, ...} as
soon as you hit upon some x which is congruent to either 1 or −1 modulo n.
Let’s use the algorithm now to factor our number 16843009. We will use the sequence x1 = 1
with xn+1 = (1024x2n + 32767 (mod#1)). [ I also tried it with the very basic polynomial
f (x) = x2 + 1, but that one went 80 rounds before stopping so I didn’t include the table
here.]
k xk gcd(n, xk − xk/2 )
1 1
2 33791 1
3 10832340
4 12473782 1
5 4239855
6 309274 n
7 11965503
8 15903688 1
9 3345998
10 2476108 n
11 11948879
12 9350010 1
13 4540646
14 858249 n
15 14246641
16 4073290 n
17 4451768
18 14770419 257
Let’s try to factor again with a different random number schema. We will use the sequence
705
x1 = 1 with xn+1 = (2048x2n + 32767 (mod#1)).
k xk gcd(n, xk − xk/2 )
1 1
2 34815 1
3 9016138
4 4752700 1
5 1678844
6 14535213 257
Algorithm To factor a number n using the quadratic sieve, one seeks two numbers x and
y which are not congruent modulo n with x not congruent to −y modulo n but have x2 ⇔ y 2
(mod n). If two such numbers are found, one can then say that (x + y)(x − y) ⇔ 0 (mod n).
Then, x + y and x − y must have non-trivial factors in common with n.
The quadratic sieve method of factoring depends upon being able to create a set of numbers
whose factorization can be expressed as a product of pre-chosen primes. These factorizations
are recorded as vectors of the exponents. Once enough vectors are collected to form a set
which contains a linear dependence, this linear dependence is exploited to find two squares
which are equivalent modulo n.
To accomplish this, the quadratic sieve method uses a set of prime numbers called a factor
base. Then, it searches for numbers which can be factored entirely within that factor base.
If there are k prime numbers in the factor base, then each number which can be factored
within the factor base is stored as a k-dimensional vector where the i-th component of the
vector for y gives the exponent of the i-th prime from the factor base in the factorization of
y. For example, if the factor base were {2, 3, 5, 7, 11, 13}, then the number y = 23 · 32 · 115
would be stored as the vector h3, 2, 0, 0, 5, 0i.
Once k + 1 of these vectors have been collected, there must be a linear dependence among
them. The k + 1 vectors are taken modulo 2 to form vectors in Zk2 . The linear dependence
among them is used to find a combination of the vectors which sum up to the zero vector
in Zk2 . Summing these vectors is equivalent to multiplying the y’s to which they correspond.
And, the zero vector in Zk2 signals a perfect square.
706
then it is called B-smooth. If it is not B-smooth, then discard xi and yi and move on to a
new choice of xi . If it is B-smooth, then store xi , yi , and the vector of its exponents for the
primes in B. Also, record a copy of the exponent vector with each component taken modulo
2.
Once k + 1 vectors have been recorded, there must be a linear dependence among them.
Using the copies of the exponent vectors that were taken modulo 2, determine which ones
can be added together to form the zero vector. Multiply together the xi that correspond to
those chosen vectors—call this x. Also, add together the original vectors that correspond
to the chosen vectors to form a new vector
Q~vk . Every component of this vector will be even.
vi
Divide each element of ~v by 2. Form y = i=1 pi .
Because each yi ⇔ x2i (mod n), x2 ⇔ y 2 (mod n). If x ⇔ y (mod n), then find some more
B-smooth numbers and try again. If x is not congruent to y modulo n, then (x + y) and
(x − y) are factors of n.
Example Consider the number n = 16843009 The integer nearest its square root is 4104.
Given the factor base
B = {2, 3, 5, 7, 13}
, the first few B-smooth values of yi = f (xi ) = x2i − n are:
xi yi = f (xi ) 2 3 5 7 13
4122 147875 0 0 3 1 2
4159 454272 7 1 0 1 2
4187 687960 3 3 1 2 1
4241 1143072 5 6 0 2 0
4497 3380000 5 0 4 0 2
4993 8087040 9 5 1 0 1
y0 = 1143072 = 25 · 36 · 50 · 72 · 130
y1 = 3380000 = 25 · 30 · 54 · 70 · 132
Which results in:
x = 4241 · 4497 = 19071777
y = 25 · 33 · 52 · 71 · 131 = 1965600
From there:
gcd(x − y, n) = 257
gcd(x + y, n) = 65537
707
It may not be completely obvious why we required that n be a quadratic residue of each pi in
the factor base B. One might intuitively think that we actually want the pi to be quadratic
residues of n instead. But, that is not the case.
(x + y)(x − y) = x2 − y 2 = n
where
k
Y
y= pvi i
i=1
Because we end up squaring y, there is no reason that the pi would need to be quadratic
residues of n.
So, why do we require that n be a quadratic residue of each pi ? We can rewrite the x2 −y 2 = n
as:
Y k
2
x − p2v
i
i
=n
i=1
If we take that expression modulo pi for any pi for which the corresponding vi is non-zero,
we are left with:
x2 ⇔ n (mod pi )
Thus, in order for pi to show up in a useful solution, n must be a quadratic residue of
pi . We would be wasting time and space to employ other primes in our factoring and
linear combinations.
708
Chapter 154
• k0 = 1.1
It is conjectured that the density of 1’s in the sequence is 0.5. It is not known whether the
1’s have a density; however, it is known that were this true, that density would be 0.5. It is
also not known whether the sequence is a strongly recurrent sequence; this too would imply
density 0.5.
To generate rapidly a large number of elements of the sequence, it is most efficient to build
a heirarchy of generators for the sequence. If the conjecture is correct, then the depth of
this heirarchy is only O(log n) to generate the first n elements.
1
Some sources start the sequence at k0 = 2, instead. This only has the effect of shifting the sequence by
one position.
709
This is sequence A000002 in the Online Encyclopedia of Integer Sequences.
710
Chapter 155
155.1 τ function
The τ function takes positive integers as its input and gives the number of positive divisors
of its input as its output. For example, since 1, 2, and 4 are all of the positive divisors of 4,
then τ (4) = 3. As another example, since 1, 2, 5, and 10 are all of the positive divisors of
10, then τ (10) = 4.
Because these two rules hold for the τ function, it is a multiplicative function.
Note that these rules work for the previous two examples. Since 2 is prime, then τ (4) =
τ (22 ) = 2 + 1 = 3. Since 2 and 5 are distinct primes, then τ (10) = τ (2 · 5) = τ (2)τ (5) =
(1 + 1)(1 + 1) = 4.
711
• p0 = 1 for any prime p.
Note: One of the major contributors to the theory of the ”arithmetic derivative” is E.J.
Barbeau who published ”Remark on an arithmetic derivative” in 1961.
Consider the natural number 6. Using the rules of the arithmetic derivative we get:
60 = (2 · 3)0 = 20 · 3 + 2 · 30 = 1 · 3 + 2 · 1 = 5
Below is a list of the 10 first natural numbers and their first and second arithmetic deriva-
tives:
n 1 2 3 4 5 6 7 8 9 10
n0 0 1 1 4 1 5 1 12 6 7
n00 0 0 0 4 0 1 0 16 5 1
...then τ counts the positive divisors of its input, which must be a positive integer.
Let p be a prime. Then p0 = 1. Since 1 is the only positive divisor of 1 and τ (1) = τ (p0 ) =
0 + 1 = 1, then τ (1) is equal to the number of positive divisors of 1.
Suppose that, for all positive integers k smaller than z ∈ Z with z > 1, the number of
positive divisors of k is τ (k). Since z > 1, then z has a prime divisor. Let p be a prime that
divides z. Let x ∈ Z+ such that px divides z and px+1 does not divide z. Let a ∈ Z+ such
that z = px a. Then gcd(a, p) = 1. Thus, gcd(a, px ) = 1. Since a < z, then, by the induction
hypothesis, there are τ (a) positive divisors of a.
712
Let d be a positive divisor of z. Let y be a nonnegative integer such that py divides d
and py+1 does not divide d. Thus, 0 ≤ y ≤ x, and there are x + 1 choices for y. Let
c ∈ Z+ such that d = py c. Then gcd(c, p) = 1. Since c divides d and d divides z, then
c divides z. Since c divides px a and gcd(c, p) = 1, then c divides a. Thus, there are τ (a)
choices for c. Since there are x + 1 choices for y and there are τ (a) choices for c, then there
are (x + 1)τ (a) choices for d. Hence, there are (x + 1)τ (a) positive divisors of z. Since
τ (z) = τ (px a) = τ (px )τ (a) = (x + 1)τ (a), it follows that, for every n ∈ Z+ , the number of
positive divisors of n is τ (n).
713
Chapter 156
156.1 monomial
1 x x2 y
xyz 3x4 y 2 z 3 −z
If there are n variables from which a monomial may be formed, then a monomial may be
represented without its coefficient as a vector of n naturals. Each position in this vector
would correspond to a particular variable, and the value of the element at each position
would correspond to the power of that variable in the monomial. For instance, the monomial
T
x2 yz 3 formed from the set of variables {w, x, y, z} would be represented as 0 2 1 3 . A
constant would be a zero vector.
Given this representation, we may define a few more concepts. First, the degree of a
monomial is the sum of the elements of its vector representation. Thus, the degree of x2 yz 3
is 0 + 2 + 1 + 3 = 6, and the degree of a constant is 0. If a polynomial is represented as a
sum over a set of monomials, then the degree of a polynomial can be defined as the degree
of the monomial of largest degree belonging to that polynomial.
714
Version: 2 Owner: bbukh Author(s): bbukh, Logan
degreeSof the zero-polynomial is −∞, since sup ∅ (per definition) is −∞, thus
Note the S
degf ∈ N {0} {−∞}.
Please note that the term order is not as common as degree. In fact, it is perhaps more
frequently associated with power series (a form of generalized polynomials) than with ordi-
nary polynomials. Also be aware that the term order occasionally is used as a synonym for
degree.
1
In order to simplify the notation, the definition is given in terms of a polynomial in two variables, however
the definition naturally scales to any number of variables.
715
Chapter 157
An equivalent definition is that all terms of the polynomial have the same degree (i.e. k).
As an important example of homogeneous polynomials one can mention the symmetric polynomials.
157.2 subfield
Let F be a field and S a subset such that S with the inherited operations from F is also a
field. Then we say that S is a subfield of F .
716
Chapter 158
If f (x) is a polynomial, then x − a is a factor if and only if a is a root (that is, f (a) = 0).
This theorem is of great help for finding factorizations of higher order polynomials. As
example, let us think that we need to factor the polynomial p(x) = x3 + 3x2 − 33x − 35.
With some help of the rational root theorem we can find that x = −1 is a root (that is,
p(−1) = 0), so we know (x + 1) must be a factor of the polynomial. We can write then
p(x) = (x + 1)q(x)
where the polynomial q(x) can be found using long or synthetic division of p(x) between
x − 1. Some calculations show us that for the example q(x) = x2 + 2x − 35 which can be
easily factored as (x − 5)(x + 7). We conclude that
Suppose that f (x) is a polynomial of degree n−1. It is infinitely differentiable, and therefore
has a Taylor series expansion about a. Since f (n) (x) = 0, the expansion terminates after the
n − 1 term. Also, the nth remainder of the Taylor series vanishes.
f (n) (y) n
Rn (x) = x =0
n!
717
Thus the function is equal to it’s Taylor series.
n−1
X f (k) (a)
f 0 (a) f 00 (a) 2
f (x) = f (a) + (x − a) + (x − a) + (x − a)k
1! 2! k!
k=3
n−1 (k)
X f (a)
f (x) = f (a) + (x − a)k
k=1
k!
n−1 (k)
X f (a)
f (x) = f (a) + (x − a) (x − a)k−1
k=1
k!
Now if x − a is a factor of f (x) then we can write f (x) = (x − a)g(x) for some polynomial
g(x). If f (a) = 0 we have
n−1 (k)
X f (a)
f (x) = (x − a) (x − a)k−1 ,
k=1
k!
So q | an pn and q | an .
718
158.4 rational root theorem
If p(x) has a rational root p/q where gcd(p, q) = 1, then p|a0 and q|an .
This theorem is a special case of a result about polynomial whose coefficients belong to a
unique factorization domain. The theorem then states that any root in the fraction field is
also in the base domain.
Joubert showed in 1861 that this polynomial can be reduced without any form of accessory
irrationalities to the 3-parameterized resolvent:
x6 + ax4 + bx2 + cx + c = 0.
This polynomial was studied in great detail by Felix Klein and Robert Fricke in the 19th
century and it is diectly related to the algebraic aspect of Hilbert’s 13th Problem. Its solution
has been reduced (by Klein) to the solution of the so-called Valentiner Form problem, a
ternary form problem which seeks the ratios of the variables involved in the invariant system
of the Valentiner group of order 360. It can also be solved with a class of generalized
hypergeometric series, by Birkeland’s approach to algebraic equations. Scott Crass has
given an explicit solution to the Valentiner problem by purely iterational methods, see
719
Chapter 159
To solve the cubic polynomial equation x3 + ax2 + bx + c = 0 for x, the first step is to apply
the Tchirnhaus transformation x = y − a3 . This reduces the equation to y 3 + py + q = 0,
where
a2
p = b−
3
ab 2a3
q = c− +
3 27
The next step is to substitute y = u − v, to obtain
From equation (159.1.2), we see that if u and v are chosen so that q = v 3 − u3 and p = 3uv,
then y = u − v will satisfy equation (159.1.1), and the cubic equation will be solved!
There remains the matter of solving q = v 3 − u3 and p = 3uv for u and v. From the second
equation, we get v = p/(3u), and substituting this v into the first equation yields
p3
q= − u3
(3u)3
720
which is a quadratic equation in u3 . Solving for u3 using the quadratic formula, we get
p
3 −27q + 108p3 + 729q 2
u =
p 54
27q + 108p3 + 729q 2
v3 =
54
Using these values for u and v, you can back–substitute y = u − v, p = b − a2 /3, q =
c − ab/3 + 2a3 /27, and x = y − a/3 to get the expression for the first root r1 in the cubic
formula. The second and third roots r2 and r3 are obtained by performing synthetic division
using r1 , and using the quadratic formula on the remaining quadratic factor.
The goal is now to choose a value for z which makes the right hand side of Equation (159.2.2)
a perfect square. The right hand side is a quadratic polynomial in y whose discriminant is
721
Our goal will be achieved if we can find a value for z which makes this discriminant zero.
But the above polynomial is a cubic polynomial in z, so its roots can be found using the
cubic formula. Choosing then such a value for z, we may rewrite Equation (159.2.2) as
for some (complicated!) values s and t, and then taking the square root of both sides and
solving the resulting quadratic equation in y provides a root of Equation (159.2.1).
We are trying to find the roots r1 , r2 , r3 of the polynomial x3 + ax2 + bx + c = 0. From the
equation
(x − r1 )(x − r2 )(x − r3 ) = x3 + ax2 + bx + c
we see that
a = −(r1 + r2 + r3 )
b = r1 r2 + r1 r3 + r2 r3
c = −r1 r2 r3
The goal is to explicitly construct a radical tower over the field k = C(a, b, c) that contains
the three roots r1 , r2 , r3 .
L = C(r1 , r2 , r3 )
A3
K=?
S3 /A3
k = C(a, b, c)
which we know from Galois theory is radical. We use Galois theory to find K and exhibit
radical generators for these extensions.
722
The proof of Hilbert’s Theorem 90 provides a procedure for finding y, which is as follows:
choose any x ∈ L, form the quantity
ωx + ω 2σ(x) + ω 3 σ 2 (x);
then this quantity automatically yields a suitable value for y provided that it is nonzero. In
particular, choosing x = r2 yields
y = r1 + ωr2 + ω 2 r3 .
and we have L = K(y) with y 3 ∈ K. Moreover, since τ := (23) does not fix y 3 , it follows
that y 3 ∈
/ k, and this, combined with [K : k] = 2, shows that K = k(y 3 ).
Set z := τ (y) = r1 + ω 2 r2 + ωr3 . Applying the same technique to the extension K/k, we find
that K = k(y 3 − z 3 ) with (y 3 − z 3 )2 ∈ k, and this exhibits K as a radical extension of k.
To get explicit formulas, start with y 3 +z 3 and y 3 z 3 , which are fixed by S3 and thus guaranteed
to be in k. Using the reduction algorithm for symmetric polynomials, we find
a = −(r1 + r2 + r3 )
y = r1 + ωr2 + ω 2r3
z = r1 + ω 2 r2 + ωr3
and we get
1
r1 = (−a + y + z)
3
1
r2 = (−a + ω 2 y + ωz)
3
1
r3 = (−a + ωy + ω 2 z)
3
which expresses r1 , r2 , r3 as radical expressions of a, b, c by way of the previously obtained
expressions for y and z, and completes the derivation of the cubic formula.
723
159.4 Galois-theoretic derivation of the quartic formula
Write N for C(r1 , r2 , r3 , r4 ) and F for C(a, b, c, d). The Galois group Γ((|Q)N/F ) is the
symmetric group S4 , the permutation group on the four elements {r1 , r2 , r3 , r4 }, which has
a composition series
1 Z/2 V4 A4 S4 ,
where:
1 N
Z/2 M
V L
A4 K
S4 F
By Galois theory, or Kummer theory, each field in this diagram is a radical extension of the
one below it, and our job is done if we explicitly find what the radical extension is in each
case.
724
We start with K/F . The index of A4 in S4 is two, so K/F is a degree two extension. We
have to find an element of K that is not in F . The easiest such element to take is the element
∆ obtained by taking the products of the differences of the roots, namely,
Y
∆ := (ri − rj ) = (r1 − r2 )(r1 − r3 )(r1 − r4 )(r2 − r3 )(r2 − r4 )(r3 − r4 ).
16i<j64
Observe that ∆ is fixed by any even permutation of the roots ri , but that σ(∆) = −∆ for
any odd permutation σ. Accordingly, ∆2 is actually fixed by all of S4 , so:
• ∆ ∈ K, but ∆ ∈
/ F.
• ∆2 ∈ F .
√
• K = F [∆] = F [ ∆2 ], thus exhibiting K/F as a radical extension.
The element ∆2 ∈ F is called the discriminant of the polynomial. An explicit formula for
∆2 can be found using the reduction algorithm for symmetric polynomials, and, although it
is not needed for our purposes, we list it here for reference:
Next up is the extension L/K, which has degree 3 since [A4 : V4 ] = 3. We have to find an
element of N which is fixed by V4 but not by A4 . Luckily, the form of V4 almost cries out
that the following elements be used:
t1 := (r1 + r2 )(r3 + r4 )
t2 := (r1 + r3 )(r2 + r4 )
t3 := (r1 + r4 )(r2 + r3 )
These three elements of N are fixed by everything in V4 , but not by everything in A4 . They
are therefore elements of L that are not in K. Moreover, every permutation in S4 permutes
the set {t1 , t2 , t3 }, so the cubic polynomial
actually has coefficients in F ! In fancier language, the cubic polynomial Φ(x) defines a
cubic extension E of F which is linearly disjoint from K, with the composite extension EK
equal to L. The polynomial Φ(x) is called the resolvent cubic of the quartic polynomial
x4 + ax3 + bx2 + cx + d. The coefficients of Φ(x) can be found fairly easily using (again) the
reduction algorithm for symmetric polynomials, which yields
725
Using the cubic formula, one can find radical expressions for the three roots of this polyno-
mial, which are t1 , t2 , and t3 , and henceforth we assume radical expressions for these three
quantities are known. We also have L = K[t1 ], which in light of what we just said, exhibits
L/K as an explicit radical extension.
The remaining extensions are easier and the reader who has followed to this point should
have no trouble with the rest. For the degree two extension M/L, we require an element of
M that is not in L; one convenient such element is r1 + r2 , which is a root of the quadratic
polynomial
(x − (r1 + r2 ))(x − (r3 + r4 )) = x2 + ax + t1 ∈ L[x] (159.4.2)
√ √
and therefore equals (−a + a2 − 4t1 )/2. Hence M = L[r1 + r2 ] = L[(−a + a2 − 4t1 )/2] is
a radical extension of L.
Finally, for the extension N/M, an element of N that is not in M is of course r1 , which is a
root of the quadratic polynomial
Now, r1 + r2 is known from the previous paragraph, so it remains to find an expression for
r1 r2 . Note that r1 r2 is fixed by (12)(34), so it is in M but not in L. To find it, use the
equation (t2 + t3 − t1 )/2 = r1 r2 + r3 r4 , which gives
(t2 + t3 − t1 )
(x − r1 r2 )(x − r3 r4 ) = x2 − x+d
2
and, upon solving for r1 r2 with the quadratic formula, yields
p
(t2 + t3 − t1 ) + (t2 + t3 − t1 )2 − 16d
r1 r2 = (159.4.4)
p4
(t2 + t3 − t1 ) − (t2 + t3 − t1 )2 − 16d
r3 r4 = (159.4.5)
4
We can then use this expression, combined with Equation (159.4.3), to solve for r1 using
the quadratic formula. Perhaps, at this point, our poor reader needs a summary of the
procedure, so we give one here:
1. Find t1 , t2 , and t3 by solving the resolvent cubic (Equation (159.4.1)) using the cubic
formula,
726
3. Using Equation (159.4.3), write
p
(r1 + r2 ) + (r1 + r2 )2 − 4(r1 r2 )
r1 =
p 2
(r1 + r2 ) − (r1 + r2 )2 − 4(r1 r2 )
r2 =
p 2
(r3 + r4 ) + (r3 + r4 )2 − 4(r3 r4 )
r3 =
p 2
(r3 + r4 ) − (r3 + r4 )2 − 4(r3 r4 )
r4 =
2
where the expressions r1 + r2 and r3 + r4 are derived in the previous step, and the
expressions r1 r2 and r3 r4 come from Equation (159.4.4) and (159.4.5).
4. Now the roots r1 , r2 , r3 , r4 of the quartic polynomial x4 + ax3 + bx2 + cx + d have been
found, and we are done!
727
159.5 cubic formula
Ax2 + Bx + C = 0.
x2 + bx + c = 0,
B C
where b = A
and c = A
. This equation can be written as
b2 b2
x2 + bx + − + c = 0,
4 4
728
so completing the square, i.e., applying the identity (p + q)2 = p2 + 2pq + q 2 , yields
2
b b2
x+ = − c.
2 4
Then, taking the square root of both sides, and solving for x, we obtain the solution formula
r
b b2
x = − ± −c
2 r4
B B2 C
= ± 2
−
2A √4A A
2
−B ± B − 4AC
= .
2A
The number ∆ = b2 − 4ac is called the discriminant if the equation. If ∆ > 0, there are
two different real roots, if ∆ = 0 there is a single real root (counted twice) and if ∆ < 0
there are no real roots (but two different complex roots).
First, consider 2x2 − 14x + 24 = 0. Here a = 2, b = −14, c = 24. Substituting in the formula
gives us p √
14 ± (−14)2 − 2 · 4 · 24 14 ± 4 14 ± 2
x= = =
2·2 4 4
So we have two solutions (depending if you take the sign + or −): x = 16
4
= 4 and x = 124
= 3.
729
159.8 quartic formula
v
u
u 1
−a 1 u
u a2 2b 2 3 (b2 − 3ac + 12d)
r2 = − u − + q
4 2t 4 3
3 2b − 9abc + 27c + 27a d − 72bd + −4(b2 − 3ac + 12d)3 + (2b3 − 9abc +
3 2 2
v
u
u 1
−a 1 u
u a2 2b 2 3 (b2 − 3ac + 12d)
r3 = + u − + q
4 2t 4 3
3 2b − 9abc + 27c + 27a d − 72bd + −4(b2 − 3ac + 12d)3 + (2b3 − 9abc +
3 2 2
v
u
u 1
−a 1 u
u a2 2b 2 3 (b2 − 3ac + 12d)
r4 = + u − + q
4 2t 4 3
3 2b − 9abc + 27c + 27a d − 72bd + −4(b2 − 3ac + 12d)3 + (2b3 − 9abc +
3 2 2
Definition [1] Let p : C → C be a polynomial of degree n with complex (or real) coefficients.
Then p is a reciprocal polynomial if
p(z) = ±z n p(1/z)
for all z ∈ C.
It is clear that if z is a zero for a reciprocal polynomial, then 1/z is also a zero. This property
motivates the name.
730
1. orthogonal matrices,
2. involution matrices,
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. N.J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, 2002.
159.10 root
Suppose you’re given an function f (x) where x is an independent variable. Then, a root of
f is a number a that is solution for the equation f (x) = 0, (that is, substituting a for x into
f gives 0 as result).
Graphically, a root of f is a value where the graph of the function intersects the x-axis.
Of course the definition can be generalized to other kind of functions. The domain needs not
to be R nor the codomain. As long as the codomain has some kind of 0 element, a root will
be an element of the domain belonging to the preimage of the 0. The function f : R → R
given by f (x) = x2 + 1 has no roots, but the function f : C → C given by f (x) = x2 + 1 has
i as a root.
In the special case of polynomials, there are general formulas for finding roots of polynomials
with degree up to 4: the quadratic formula, the cubic formula and the quartic formula.
If we have a root a for a polynomial f (x), we divide f (x) by x − a (either by polynomial long
division or synthetic division) and we are left with a polynomial with smaller degrees whose
roots are the other roots of f . We can use that result together with the rational root theorem
to find a rational root if exists, and then get a polynomial with smaller degree which possibly
we can find easily the other roots.
Considering the general case of functions y = f (x) (not necessarily polynomials) there are
several numerical methods (like Newton’s method) to approximate roots. This could be
handy too for polynomials whose roots are not rational numbers.
731
159.11 variant of Cardano’s derivation
By a linear change of variable, a cubic polynomial over C can be given the form x3 + 3bx + c.
To find the zeros of this cubic in the form of surds in b and c, make the substitution x =
y 1/3 + z 1/3 , thus replacing one unknown with two, and then write down identities which are
suggested by the resulting equation in two unknowns. Specifically, we get
y+z+c=0 (159.11.2)
3y 1/3 z 1/3 + 3b = 0, (159.11.3)
yz = −b3 . (159.11.4)
The pair of equations (1) and (3) is a quadratic system in y and z, readily solved. But notice
that (2) puts a restriction on a certain choice of cube roots.
732
Chapter 160
12D99 – Miscellaneous
Let x be any real number. Then there exists a natural number n such that n > x.
This theorem is known as the Archimedean property of real numbers. It is also some-
times called the axiom of Archimedes, although this name is doubly deceptive: it is neither
an axiom (it is rather a consequence of the least upper bound property) nor attributed to
Archimedes (in fact, Archimedes credits it to Eudoxus).
Assume S is nonempty. Since S has an upper bound, S must have a least upper bound; call
it b. Now consider b − 1. Since b is the least upper bound, b − 1 cannot be an upper bound
of S; therefore, there exists some y ∈ S such that y > b − 1. Let n = y + 1; then n > b. But
y is a natural, so n must also be a natural. Since n > b, we know n 6∈ S; since n 6∈ S, we
know n > x. Thus we have a natural greater than x.
Corollary 4. If x and y are real numbers with x > 0, there exists a natural n such that
nx > y.
S ince x and y are reals, and x 6= 0, y/x is a real. By the Archimedean property, we can
choose an n ∈ N such that n > y/x. Then nx > y.
Corollary 5. If w is a real number greater than 0, there exists a natural n such that 0 <
1/n < w.
U sing Corollary 1, choose n ∈ N satisfying nw > 1. Then 0 < 1/n < w. QED
Corollary 6. If x and y are real numbers with x < y, there exists a rational number a such
that x < a < y.
733
F irst examine the case where 0 6 x. Using Corollary 2, find a natural n satisfying
0 < 1/n < (y − x). Let S = {m ∈ N : m/n > y}. By Corollary 1 S is non-empty, so
let m0 be the least element of S and let a = (m0 − 1)/n. Then a < y. Furthermore, since
y 6 m0 /n, we have y − 1/n < a; and x < y − 1/n < a. Thus a satisfies x < a < y.
Finally consider the case where x < y 6 0. Using the first case, let b be a rational satisfying
−y < b < −x. Then let a = −b.
160.2 complex
There are some polynomial equations that don’t have (real) solutions. Examples of these
are x2 + 5 = 0, x2 + x + 1 = 0. Mathematically we express this by saying that R is not an
algebraically closed field.
In order to solve that kind of equation, we have to ”extend” our number system. by adding a
number i that has the property that i2 = −1. In this way we extend the field of real numbers
R to a field C whose elements are called complex numbers. The formal construction can
be seen at [complex numbers]. The field C is algebraically closed: every polynomial with
complex coefficients, and therefore every polynomial with real coefficients, has at least one
complex root (which might be real as well).
Any complex number can be written as z = x + iy (with x, y ∈ R). Here we call x the real
part of z and y the imaginary part of z. We write this as
x = Re(z) y = Im(z)
Real numbers are a subset of complex numbers, and a real number r can be written also as
r + i0. Thus, a complex number is real if and only if its imaginary part is equal to zero.
By writing x + iy as (x, y) we can also look at complex numbers as ordered pairs. With this
notation, real numbers are the pairs of the form (r, 0).
734
Seeing complex numbers as ordered pairs also let us give C the structure of vector space
(over R). The norm of z = x + iy is defined as
p
|z| = x2 + y 2.
Then we have |z|2 = zz where z is the conjugate of z = x + iy and it’s defined as z = x − iy.
Thus we can also characterize real numbers as those complex numbers z such that z = z.
z1 + z2 = z1 + z2
z1 z2 = z1 z2
z = z
The ordered-pair notation lets us visualize complex numbers as points in the plane, but then
we can also describe complex numbers with polar coordinates.
If r = a + ib = (r, t), then a = r sin t and b = r cos t. So we have the following expression,
called the polar form of z:
z = a + ib = r(cos t + i sin t)
Multiplication of complex numbers can be done in a very neat way using polar coordinates:
160.3.1 Definition
735
z = a + bi
z̄ = a − bi
Complex conjugation represents a reflection about the real axis on the Argand diagram
representing a complex number.
Sometimes a star (∗) is used instead of an overline, e.g. in physics you might see
int∞ ∗
−∞ Ψ Ψdx = 1
Let A = (aij ) be a n × m matrix with complex entries. Then the complex conjugate of
A is the matrix A = (aij ). In particular, if v = (v 1 , . . . , v n ) is a complex row/ vector, then
v = (v 1 , . . . , v n ).
Hence, the matrix complex conjugate is what we would expect: the same matrix with all of
its scalar components conjugated.
Scalar Properties
1. uv = (u)(v)
2. u + v = u + v
−1
3. u = u−1
4. (u) = u
5. If v 6= 0, then ( uv ) = u/v
736
6. Let u = a + bi. Then uu = uu = a2 + b2 ≥ 0 (the complex modulus).
Let A be a matrix with complex entries, and let v be a complex row/column vector.
Then
T
1. AT = A
2. Av = Av, and vA = vA. (Here we assume that A and v are compatible size.)
1. trace A = (trace A)
2. det A = (det A)
−1
3. A = A−1
The ring of complex numbers C is defined to be the quotient ring of the polynomial ring
R[X] in one variable over the reals by the principal ideal (X 2 + 1). For a, b ∈ R, the
equivalence class of a + bX in C is usually denoted a + bi, and one has i2 = −1.
The complex numbers form an algebraically closed field. There is a standard metric on the
complex numbers, defined by
p
d(a1 + b1 i, a2 + b2 i) := (a2 − a1 )2 + (b2 − b1 )2 .
737
160.5 examples of totally real fields
Here we present examples of totally real fields, totally imaginary fields and CM-fields.
Examples:
√
1. Let K = Q( d) with d a square-free positive integer. Then
ΣK = {IdK , σ}
where IdK : K ,→ C is the identity map (IdK (k) = k, for all k ∈ K), whereas
√ √
σ : K ,→ C, σ(a + b d) = a − b d
√
Since d ∈ R it follows that K is a totally real field.
√
2. Similarly, let K = Q( d) with d a square-free negative integer. Then
ΣK = {IdK , σ}
where IdK : K ,→ C is the identity map (IdK (k) = k, for all k ∈ K), whereas
√ √
σ : K ,→ C, σ(a + b d) = a − b d
√
Since d ∈ C and it is not in R, it follows that K is a totally imaginary field.
3. Let ζn , n > 3, be a primitive nth root of unity and let L = Q(ζn ), a cyclotomic extension.
Note that the only roots of unity that are real are ±1. If ψ : L ,→ C is an embedding,
then ψ(ζn ) must be a conjugate of ζn , i.e. one of
{ζna | a ∈ (Z/nZ)× }
but those are all imaginary. Thus ψ(L) * R. Hence L is a totally imaginary field.
4. In fact, L as in (3) is a CM-field. Indeed, the maximal real subfield of L is
F = Q(ζn + ζn−1 )
Notice that the minimal polynomial of ζn over F is
X 2 − (ζn + ζn−1)X + 1
so we obtain L from F by adjoining the square root of the discriminant of this polynomial
which is
4π
ζn2 + ζn−2 − 2 = 2 cos( ) − 2 < 0
n
and any other conjugate is
4aπ
ζn2a + ζn−2a − 2 = 2 cos( ) − 2 < 0, a ∈ (Z/nZ)×
n
Hence, L is a CM-field.
5. Notice that any quadratic imaginary number field is obviously a CM-field.
738
160.6 fundamental theorem of algebra
160.7 imaginary
The imaginary numbers are closed under addition but not under multiplication.
Note that there are two complex square roots of −1 (i.e. the two solutions to the equation
x2 + 1 = 0 in C), so there is always some ambiguity in which of these we choose to call
”i” and which we call ”−i”, though this has little bearing on any applications of complex
numbers.
The expression
0
0
739
is known as the indeterminate form. The motivation for this name is that there are no
rules for comparing the value of 00 to the other real numbers. Note that, for example, 10 is
not indeterminate, since we can justifiably associate it with +∞, which does compare with
the rest of the real numbers (in particular, it is defined to be greater than all of them.)
0
Although 0
is called “the” indeterminate form, another indeterminate form is
∞
∞
Suppose a and b are real numbers. Then we have four types of inequalities between a and b:
Properties
740
Examples
2. Jordan’s Inequality
3. Young’s inequality
4. Bernoulli’s inequality
5. Nesbitt’s inequality
6. Shapiro inequality
1. Chebyshev’s inequality
2. MacLaurin’s Inequality
3. Carleman’s inequality
5. Jensen’s inequality
6. Minkowski inequality
7. rearrangement inequality
Geometric inequalities
1. Hadwiger-Finsler inequality
2. Weizenbock’s Inequality
3. Brunn-Minkowski inequality
Matrix inequalities
1. Shur’s inequality
741
160.11 interval
Loosely speaking, an interval is a part of the real numbers that start at one number and
stops at another number. For instance, all numbers greater that 1 and smaller than 2 form
in interval. Another interval is formed by numbers greater or equal to 1 and smaller than
2. Thus, when talking about intervals, it is necessary to specify whether the endpoints are
part of the interval or not. There are then four types of intervals with three different names:
open, closed and half-open. Let us next define these precisely.
1. The open interval contains neither of the endpoints. If a < b are real numbers, then
the open interval of numbers between a and b is written as (a, b) and
(a, b) = {x ∈ R | a < x < b}.
2. The closed interval contains both endpoints. If a < b are real numbers, then the closed
interval is written as [a, b] and
[a, b] = {x ∈ R | a ≤ x ≤ b}.
3. A half-open interval contains only one of the endpoints. If a < b are real numbers, the
half-open intervals (a, b] and [a, b) are defined as
(a, b] = {x ∈ R | a < x ≤ b},
[a, b) = {x ∈ R | a ≤ x < b}.
Infinite intervals
In [1, 2], an open interval is always called a segment, and a closed interval is called simply
an interval. However, the above naming with open, closed, and half-open interval seems to
be more widely adopted. See e.g. [3, 4, 2]. To distinguish between [a, b) and (a, b], the former
is sometimes called a right half-open interval and the latter a left half-open interval
[6]. The notation (a, b), [a, b), (a, b], [a, b] seems to be standard. However, some authors
(especially from the French school) use the notation ]a, b[, [a, b[, ]a, b], [a, b] as opposed to
(a, b), [a, b), (a, b], [a, b].
742
REFERENCES
1. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976.
2. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987.
3. R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd., 3rd ed., 1995.
4. L. Rȧde, B. Westergren, Mathematics Handbook for Science and Engineering, Stu-
dentlitteratur, 1995.
5. R.A. Silverman, Introductory Complex Analysis, Dover Publications, 1972.
6. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathe-
matical Society, 1998.
Definition Let z be a complex number, and let z be the complex conjugate of z. Then the
modulus, or absolute value, of z is defined as [1]
√
|z| = zz.
If we write z in polar form as z = reiφ with r ≥ 0, φ ∈ [0, 2π), then |z| = r. It follows
that the modulus is a positive real number or zero. Alternatively, if a and b are the real
respectively imaginary parts of z, then
√
|z| = a2 + b2 , (160.12.1)
which is simply the Euclidean norm of the point (a, b) ∈ R2 . It follows that the modulus
satisfies the triangle inequality, i.e., property 2 below. Other properties of the modulus are
as follows [1, 2]: If u, v ∈ C, then
Property 3 and 6 follows by writing u and v in polar form (see e.g. [2]). Property 5 follows
from property 3 and the identity 1/u = u/|u|2. Indeed,
|u/v| = |uv/|v|2| = |u||v/|v|2| = |u|/|v|.
743
REFERENCES
1. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed.
2. E. Weisstein, Eric W. Weisstein’s world of mathematics, entry on Complex Modulus.
If f (x) ∈ C[x] let a be a root of f (x) in some extension of C. Let K be a Galois closure of
C(a) over R and set G = Gal(K/R). Let H be a Sylow 2-subgroup of G and let L = K H (the
fixed field of H in K). By the Fundamental Theorem of Galois Theory we have [L : R] =
[G : H], an odd number. We may write L = R(b) for some b ∈ L, so the minimal polynomial
mb,R (x) is irreducible over R and of odd degree. That degree must be 1, and hence L = R,
which means that G = H, a 2-group. Thus G1 = Gal(K/C) is also a 2-group. If G1 6= 1
choose G2 ≤ G1 such that [G1 : G2 ] = 2, and set M = K G2 , so that [M : C] = [G1 : G2 ] = 2.
But any polynomial of degree 2 over C has roots in C by the quadratic formula, so such a
field M cannot exist. This contradiction shows that G1 = 1. Hence K = C and a ∈ C,
completing the proof.
Let L be a subfield of C.
744
Definition 9.
Note that if σ is a real embedding then σ̄ = σ, where · denotes the complex conjugation
automorphism:
· : C → C, (a + bi) = a − bi
On the other hand, if τ is a complex embedding, then τ̄ is another complex embedding, so
the complex embeddings always come in pairs {τ, τ̄ }.
Hence, by the theorem, we know that the order of ΣL is [L : Q]. The number [L : Q] is
usually decomposed as
[L : Q] = r1 + 2r2
where r1 is the number of embeddings which are real, and 2r2 is the number of embeddings
which are complex (non-real). Notice that by the remark above this number is always even,
so r2 is an integer.
745
160.16 real number
There are several equivalent definitions of real number, all in common use. We give one
definition in detail and mention the other ones.
The set R of real numbers is the set of equivalence classes of Cauchy sequences of rational
numbers, under the equivalence relation {xi } ∼ {yi} if the interleave sequence of the two
sequences is itself a Cauchy sequence. The real numbers form a ring, with addition and
multiplication defined by
There is an ordering relation on R, defined by {xi } 6 {yi} if either {xi } ∼ {yi} or there
exists a natural number N such that xn < yn for all n > N.
One can prove that the real numbers form an ordered field and that they satisfy the least upper bound proper
For every nonempty subset S ⊂ R, if S has an upper bound then S has a lowest upper bound.
It is also true that every ordered field with the least upper bound property is isomorphic to
R.
2. Dedekind cuts of rational numbers (that is, subsets S of Q with the property that, if
a ∈ S and b < a, then b ∈ S).
3. The real numbers can also be defined as the unique (up to isomorphism) ordered field
satisfying the least upper bound property.
The real numbers are often described as the unique (up to isomorphism) complete ordered
field. While this fact is true, care must be taken when using it for the definition, because
the standard definition of ”complete” is logically dependent on the notion of real number.
746
160.17 totally real and imaginary fields
For this entry, we follow the notation of the entry real and complex embeddings.
Let K be a subfield of the complex numbers, C, and let ΣK be the set of all embeddings of
K in C.
Note: A complex number ω is real if and only if ω̄, the complex conjugate of ω, equals ω:
ω ∈ R ⇔ ω = ω̄
Thus, a field K which is fixed pointwise by complex conjugation is totally real. Given a
field L, the subfield of L fixed pointwise by complex conjugation is called the maximal real
subfield of L.
For examples (of (1), (2) and (3)), see examples of totally real fields.
747
Chapter 161
There are a few different things that are sometimes called ”Gauss’ lemma”. See also Gauss’s Lemma II.
Gauss’s Lemma I: If R is a UFD and f (x) and g(x) are both primitive in R[x], so is
f (x)g(x).
Proof: Suppose f (x)g(x) not primitive. We will show either f (x) or g(x) isn’t as well.
f (x)g(x) not primitive means the gcd of the coefficients of f (x)g(x) is not a unit. Let p
be a prime factor of that gcd. We consider the image of R mod p - i.e. under the natural
ring homomorphism θ : R → R/pR - and extend to the polynomial ring.
f (x) g(x) = 0
Where f (x) is the image of f (x) in (R/pR)[x], similarly g(x). So f (x) = 0 or g(x) = 0. So
f (x) or g(x) is divisible by p, so one of them is not primitive.
748
161.2 Gauss’s Lemma II
proposition (Gauss): Let R be a UFD and F its field of fractions. If a polynomial p ∈ R[x]
is reducible in F [x], then it is reducible in R[x].
Proof:We may assume that p is primitive. Suppose p = qr with q, r ∈ F [x]. There are unique
elements a, b ∈ F such that q/a and r/b are in R[x] and are primitive. But p/ab = (q/a)(r/b).
Since p is primitive, it follows from Gauss’s Lemma I that ab is a unit, and therefore so are
a and b. This completes the proof.
Remark:Another result with the same name is Gauss’ lemma on quadratic residues.
161.3 discriminant
Summary. The discriminant of a given polynomial is a number, calculated from the co-
efficients of that polynomial, that vanishes if and only if that polynomial has one or more
multiple roots. Using the discriminant we can test for the presence of multiple roots, without
having to actually calculate the roots of the polynomial in question.
where
sk = sk (x1 , . . . , xn ), k = 1, . . . , n
is the k th elementary symmetric polynomial.
The above relation is a defining one, because the right-hand side of (1) is, evidently, a (1)
symmetric (1) polynomial, and because the (1) algebra of symmetric polynomials is freely
generated by the basic symmetric polynomials, i.e. every symmetric polynomial arises in a
unique fashion as a polynomial of s1 , . . . , sn .
749
columns n to 2n − 1 formed by shifting the sequence n, (n − 1)a1 , . . . , an−1 , i.e.
1 0 ... 0 n 0 ... 0
a1 1 ... 0 (n − 1) a1 n ... 0
a2 a1 . . . 0 (n − 2) a2 (n − 1) a1 . . . 0
(n)
δ = .. .. . . .. .. .. .. .. (161.3.2)
. . . . . . . .
0 0 . . . an−1 0 0 . . . 2 an−2
0 0 . . . an 0 0 . . . an−1
Multiple root test. Let K be a field, let x denote an indeterminate, and let
p = xn + a1 xn−1 + . . . + an−1 x + an , ai ∈ K
Proposition 2. The discriminant vanishes if and only if p has multiple roots in its splitting field.
Proof. It isn’t hard to show that a polynomial has multiple roots if and only if that polynomial
and its derivative share a common root. The desired conclusion now follows by observing
that the determinant formula in equation (161.3.2) gives the resolvent of a polynomial and
its derivative. This resolvent vanishes if and only if the polynomial in question has a multiple
root. Q.E.D.
δ (1) = 1
δ (2) = a21 − 4 a2
δ (3) = 18 a1 a2 a3 + a21 a22 − 4 a32 − 4 a31 a3 − 27a23
δ (4) = a21 a22 a23 − 4 a32 a23 − 4 a31 a33 + 18 a1 a2 a33 − 27 a43
− 4 a21 a32 a4 + 16 a42 a4 + 18 a31 a2 a3 a4 − 80 a1 a22 a3 a4
− 6 a21 a23 a4 + 144 a2a23 a4 − 27 a41 a24 + 144 a21 a2 a24
− 128 a22 a24 − 192 a1 a3 a24 + 256 a34
750
Here is the matrix used to calculate δ (4) :
1 0 0 4 0 0 0
a1 1 0 3a1 4 0 0
a2 a1 1 2a2 3a1 4 0
δ = a3
(4)
a2 a1 a3 2a2 3a1 4
a4 a3 a2 0 a3 2a2 3a1
0 a4 a3 0 0 a3 2a2
0 0 a4 0 0 0 a3
Let R be a ring. The polynomial ring over R in one variable X is the set R[X] of all sequences
in R with only finitely many nonzero terms. If (a0 , a1 , a2 , a3 , . . . ) is an element in R[X], with
an = 0 for all n > N, then we usually write this element as
N
X
an X n = a0 + a1 X + a2 X 2 + a3 X 3 + · · · + aN X N
n=0
161.5 resolvent
Summary. The resolvent of two polynomials is a number, calculated from the coefficients
of those polynomials, that vanishes if and only if the two polynomials share a common root.
Conversely, the resolvent is non-zero if and only if the two polynomials are mutually prime.
751
Definition. Let K be a field and let
p(x) = a0 xn + a1 xn−1 + . . . + an ,
q(x) = b0 xm + b1 xm−1 + . . . + bm
a0 0 ... 0 b0 0 ... 0
a1 a0 ... 0 b1 b0 ... 0
a2 a1 ... 0 b2 b1 ... 0
Res[p, q] = .. .. .. .. .. .. .. ..
. . . . . . . .
0 0 . . . an−1 0 0 . . . bm−1
0 0 . . . an 0 0 . . . bm
Proposition 3. The resolvent of two polynomials is non-zero if and only if the polynomials
are relatively prime.
Proof. Let p(x), q(x) ∈ K[x] be two arbitrary polynomials of degree n and m, respectively.
The polynomials are relatively prime if and only if every polynomial — including the unit
polynomial 1 — can be formed as a linear combination of p(x) and q(x). Let
c0
c1
a0 0 ... 0 b0 0 ... 0 c2
..
a1 a0 ... 0 b1 b0 ... 0
.
a2 a1 ... 0 b2 b1 ... 0
cm−1
.. .. .. .. .. .. .. .. d
. . . . . . . . 0
0 0 . . . an−1 0 0 ... bm−1 d1
0 0 . . . an 0 0 ... bm d2
.
..
dn−1
In consequence of the preceding remarks, p(x) and q(x) are relatively prime if and only if
the matrix above is non-singular, i.e. the resolvent is non-vanishing. Q.E.D.
752
Alternative Characterization. The following proposition describes the resolvent of two
polynomials in terms of the polynomials’ roots. Indeed this property uniquely characterizes
the resolvent, as can be seen by carefully studying the appended proof.
Proposition 4. Let p(x), q(x) be as above and let x1 , . . . , xn and y1 , . . . , ym be their respective
roots in the algebraic closure of K. Then,
n Y
Y m
Res[p, q] = am n
0 b0 (xi − yj )
i=1 j=1
where
ai
Ai = , i = 1, . . . n,
a0
bj
Bj = , j = 1, . . . m.
b0
It therefore suffices to prove the proposition for monic polynomials. Without loss of gener-
ality we can also assume that the roots in question are algebraically independent.
P (x) = (x − X1 ) . . . (x − Xn ),
Q(x) = (x − Y1 ) . . . (x − Ym ),
G(X1 , . . . , Xn , Y1 , . . . , Ym ) = Res[P, Q]
Next, consider the main diagonal of the matrix whose determinant gives Res[P, Q]. The first
m entries of the diagonal are equal to 1, and the next n entries are equal to (−1)m Y1 . . . Ym .
It follows that the expansion of G contains a term of the form (−1)mn Y1n . . . Ymn . However,
the expansion of F contains exactly the same term, and therefore F = G. Q.E.D.
753
161.6 de Moivre identity
it follows that
This is called de Moivre’s formula, and besides being generally useful, it’s a convenient
way to remember double- (and higher-multiple-) angle formulas. For example,
cos 2θ + i sin 2θ = (cos θ + i sin θ)2 = cos2 θ + 2i sin θ cos θ − sin2 θ. (161.6.4)
Since the imaginary parts and real parts on each side must be equal, we must have
cos 2θ = cos2 θ − sin2 θ and (161.6.5)
sin 2θ = 2 sin θ cos θ. (161.6.6)
161.7 monic
One of the many consequences of this theorem is that for a finite projective plane, Desargue’s
Theorem implies Pappus’ theorem.
754
Version: 2 Owner: saforres Author(s): saforres
We want to show that the multiplication operation in a finite division ring is abelian.
0 and 1 are obviously elements of CD (x) and if y and z are, then x(−y) = −(xy) = −(yx) =
(−y)x, x(y + z) = xy + xz = yx + zx = (y + z)x and x(yz) = (xy)z = (yx)z = y(xz) =
y(zx) = (yz)x, so −y, y + z, and yz are also elements of CD (x). Moreover, for y 6= 0, xy = yx
implies y −1 x = xy −1 , so y −1 is also an element of CD (x).
Now we consider the center of D which we’ll call Z(D). This is also a subring and is in fact
the intersection of all centralizers.
\
Z(D) = CD (x)
x∈D
Z(D) is an abelian subring of D and is thus a field. We can consider D and every CD (x)
as vector spaces over Z(D) of dimension n and nx respectively. Since D can be viewed as a
module over CD (x) we find that nx divides n. If we put q := |Z(D)|, we see that q > 2 since
{0, 1} ⊂ Z(D), and that |CD (x)| = q nx and |D| = q n .
It suffices to show that n = 1 to prove that multiplication is abelian, since then |Z(D)| = |D|
and so Z(D) = D.
X
|D ∗| = |Z(D ∗ )| + [D ∗ : CD∗ (x)]
x
which gives
X qn − 1
qn − 1 = q − 1 +
x
q nx − 1
.
By Zsigmondy’s theorem, there exists a prime p that divides q n − 1 but doesn’t divide any of
the q m − 1 for 0 < m < n, except in 2 pathological cases which will be dealt with separately.
755
q n −1
Such a prime p will divide q n − 1 and each of the q nx −1
. So it will also divide q − 1 which
can only happen if n = 1.
We now deal with the 2 exceptional cases. In the first case n equals 2, which would mean D is
a vector space of dimension 2 over Z(D), with elements of the form a+bα where a, b ∈ Z(D).
Such elements clearly commute so D = Z(D) which contradicts our assumption that Pn = 2.
6
In the second case, n = 6 and q = 2. The class equation reduces to 64 − 1 = 2 − 1 + x 22nx−1
−1
where nx divides 6. This gives 62 = 63x + 21y + 9z with x, y and z integers, which is
impossible since the right hand side is divisible by 3 and the left hand side isn’t.
We can prove Wedderburn Theorem,without using Zsigismondy’s theorem on the conjugacy class formula
of the first proof; let Gn set of n-th roots of unity and Pn set of n-th primitive roots of unity
and Φd (q) the d-th cyclotomic polynomial.
It results
Q
• Φn (q) = ξ∈Pn (q − ξ)
Q Q
• p(q) = q n − 1 = ξ∈Gn (q − ξ) = d|n Φd (q)
If,for n > 1,we have |Φn (q)| > q − 1,then n = 1 and the theorem is proved.
756
We know that
Y
|Φn (q)| = |q − ξ| , with q − ξ ∈ C
ξ∈Pn
|q − ξ| = |q − 1| ⇔ ξ = 1
but
n > 1 ⇒ ξ 6= 1
therefore,we have
|q − ξ| > |q − 1| = q − 1 ⇒ |Φn (q)| > q − 1
A finite field is a field F which has finitely many elements. We will present some basic facts
about finite fields.
Now that we know every finite field has pn elements, it is natural to ask which of these
actually arise as cardinalities of finite fields. It turns out that for each prime p and each
natural number n, there is essentially exactly one finite field of size pn .
757
Lemma 2. In any field F with m elements, the equation xm = x is satisfied by all elements
x of F .
Theorem 10. For each prime p > 0 and each natural number n ∈ N, there exists a finite
field of cardinality pn , and any two such are isomorphic.
F or n = 1, the finite field Fp := Z/pZ has p elements, and any two such are isomorphic by
the map sending 1 to 1.
n
In general, the polynomial f (X) := X p −X ∈ Fp [X] has derivative −1 and thus is separable
over Fp . We claim that the splitting field F of this polynomial is a finite field of size pn .
The field F certainly contains the set S of roots of f (X). However, the set S is closed under
the field operations, so S is itself a field. Since splitting fields are minimal by definition, the
containment S ⊂ F means that S = F . Finally, S has pn elements since f (X) is separable,
so F is a field of size pn .
For the uniqueness part, any other field F 0 of size pn contains a subfield isomorphic to Fp .
n
Moreover, F 0 equals the splitting field of the polynomial X p − X over Fp , since by lemma 2
every element of F 0 is a root of this polynomial, and all pn possible roots of the polynomial
are accounted for in this way. By the uniqueness of splitting fields up to isomorphism, the
two fields F and F 0 are isomorphic.
Note: The proof of Theorem 10 given here, while standard because of its efficiency, relies
on more abstract algebra than is strictly necessary. The reader may find a more concrete
presentation of this and many other results about finite fields in [1, Ch. 7].
T his follows from the fact that field extensions obtained from splitting fields are normal
extensions.
Henceforth, in light of Theorem 10, we will write Fq for the unique (up to isomorphism)
finite field of cardinality q = pn . A fundamental step in the investigation of finite fields is
the observation that their multiplicative groups are cyclic:
Theorem 11. Let Fq∗ denote the multiplicative group of nonzero elements of the finite field
Fq . Then Fq∗ is a cyclic group.
758
W e begin with the formula X
φ(d) = k, (161.11.1)
d|k
where φ denotes the Euler totient function. It is proved as follows. For every divisor d of
k, the cyclic group Ck of size k has exactly one cyclic subgroup Cd of size d. Let Gd be the
subset of Cd consisting of elements of Cd which have the maximum possible order of d. Since
every element of Ck has maximal order in the subgroup of Ck that it generates, we see that
the sets Gd partition the set Ck , so that
X
|Gd | = |Ck | = k.
d|k
The identity (161.11.1) then follows from the observation that the cyclic subgroup Cd has
exactly φ(d) elements of maximal order d.
We now prove the theorem. Let k = q − 1, and for each divisor d of k, let ψ(d) be the
number of elements of Fq∗ of order d. We claim that ψ(d) is either zero or φ(d). Indeed, if
it is nonzero, then let x ∈ Fq∗ be an element of order d, and let Gx be the subgroup of Fq∗
generated by x. Then Gx has size d and every element of Gx is a root of the polynomial
xd − 1. But this polynomial cannot have more than d roots in a field, so every root of xd − 1
must be an element of Gx . In particular, every element of order d must be in Gx already,
and we see that Gx only has φ(d) elements of order d.
We have proved that ψ(d) 6 φ(d) for all d | q − 1. If ψ(q − 1) were 0, then we would have
X X
ψ(d) < φ(d) = q − 1,
d|q−1 d|q−1
which is impossible since the first sum must equal q − 1 (because every element of Fq∗ has
order equal to some divisor d of q − 1).
A more constructive proof of Theorem 11, which actually exhibits a generator for the cyclic
group, may be found in [2, Ch. 16].
T he fact that Frobq is an element of Γ((|Q)Fqm /Fq ), and that (Frobq )m = Frobqm is the
identity on Fqm , is obvious. Since the extension Fqm /Fq is normal and of degree m, the
759
group Γ((|Q)Fqm /Fq ) must have size m, and we will be done if we can show that (Frobq )k ,
for k = 0, 1, . . . , m − 1, are distinct elements of Γ((|Q)Fqm /Fq ).
We can now use the Galois correspondence between subgroups of the Galois group and
intermediate fields of a field extension to immediately classify all the intermediate fields in
the extension Fqm /Fq .
Theorem 13. The field extension Fqm /Fq contains exactly one intermediate field isomorphic
to Fqd , for each divisor d of m, and no others. In particular, the subfields of Fq are precisely
the fields Fpd for d | n.
B y the fundamental theorem of Galois theory, each intermediate field of Fqm /Fq corre-
sponds to a subgroup of Γ((|Q)Fqm /Fq ). The latter is a cyclic group of order m, so its
subgroups are exactly the cyclic groups generated by (Frobq )d , one for each d | m. The
d
fixed field of (Frobq )d is the set of roots of X q − X, which forms a subfield of Fqm isomor-
phic to Fqd , so the result follows.
The subfields of Fq can be obtained by applying the above considerations to the extension
Fpn /Fq .
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory, Second
Edition, Springer–Verlag, 1990 (GTM 84).
2. Ian Stewart, Galois Theory, Second Edition, Chapman & Hall, 1989.
760
is an injective field automorphism, called the Frobenius automorphism, or simply the Frobe-
nius map on F .
Note: This morphism is sometimes also called the “small Frobenius” to distinguish it from
the map a 7→ aq , with q = pn . This map is then also referred to as the “big Frobenius” or
the “power Frobenius map”.
161.13 characteristic
Let (F, +, ·) be a field. The characteristic Char(F ) of F is commonly given by one of three
equivalent definitions:
• if there is some positive integer n for which the result of adding 1 to itself n times
yields 0, then the characteristic of the field is the least such n. Otherwise, Char(F ) is
defined to be 0.
• if K is the prime subfield of F , then Char(F ) is the size of K if this is finite, and 0
otherwise.
Note that the first two definitions also apply to arbitrary rings, and not just to fields.
The characteristic of a field (or more generally an integral domain) is always prime. For if
the characteristic of F were composite, say mn for m, n > 1, then in particular mn would
equal zero. Then either m would be zero or n would be zero, so the characteristic of F would
actually be smaller than mn, contradicting the minimality condition.
Proposition 6. The ring R (as above) is a field if and only if R has exactly two ideals:
(0), R.
761
(⇒) Suppose R is a field and let A be a non-zero ideal of R. Then there exists r ∈ A ⊆ R
6 0. Since R is a field and r is a non-zero element, there exists s ∈ R such that
with r =
s·r =1∈R
(⇐) Suppose the ring R has only two ideals, namely (0), R. Let a ∈ R be a non-zero
element; we would like to prove the existence of a multiplicative inverse for a in R. Define
the following set:
A = (a) = {r ∈ R | r = s · a, for some s ∈ R}
This is clearly an ideal, the ideal generated by the element a. Moreover, this ideal is not the
zero ideal because a ∈ A and a was assumed to be non-zero. Thus, since there are only two
ideals, we conclude A = R. Therefore 1 ∈ A = R so there exists an element s ∈ R such that
s·a = 1∈R
Hence for all non-zero a ∈ R, a has a multiplicative inverse in R, so R is, in fact, a field.
Let K be a field of finite characteristic, such as Fp . Then the ring of polynomials, K[X], is an
integral domain. We may therefore construct its quotient field, namely the field of rational
polynomials. This is an example of an infinite field with finite characteristic.
Fields are typically sets of “numbers” in which the arithmetic operations of addition, sub-
traction, multiplication and division are defined. Another important class of fields are the
function fields defined on geometric objects such as algebraic varieties or Riemann surfaces.
The following is a list of examples of fields.
• The rational numbers Q, the real numbers R and the complex numbers C are the most
familiar examples of fields.
762
• Slightly more exotic, the hyperreal numbers and the surreal numbers are fields con-
taining infinitesimal and infinitely large numbers. (The surreal numbers aren’t a field
in the strict sense since they form a proper class and not a set.)
• The algebraic numbers form a field; this is the algebraic closure of Q. In general, every
field has an (essentially unique) algebraic closure.
• The computable complex numbers (those whose digit sequence can be produced by a
Turing machine) form a field. The definable complex numbers (those which can be pre-
cisely specified using a logical formula) form a field containing the computable numbers;
arguably, this field contains all the numbers we can ever talk about. It is countable.
• If p is a prime number, then the integers modulo p form a finite field with p elements,
typically denoted by Fp . More generally, for every prime power pn there is one and
only one finite field Fpn with pn elements.
• If K is a field, we can form the field of rational functions over K, denoted by K(X).
It consists of quotients of polynomials in X with coefficients in K.
• If V is a variety over the field K, then the function field of V , denoted by K(V ),
consists of all quotients of polynomial functions defined on V .
• If U is a domain (= connected open set) in C, then the set of all meromorphic functions
on U is a field. More generally, the meromorphic functions on any Riemann surface
form a field.
• The field of formal Laurent series over the field K in the variable X consists of all
expressions of the form
X∞
aj X j
j=−M
• More generally, whenever R is an integral domain, we can form its field of fraction, a
field whose elements are the fractions of elements of R.
763
161.17 field
• 1 6= 0
3. ψ(1) = 1, ψ(0) = 0
Remark: For this reason the terms “field homomorphism” and “field monomorphism” are
synonymous. Also note that if ψ is a field monomorphism, then
ψ(F ) ∼
= F, ψ(F ) ⊆ K
ψ: F → K
χ: F → H
764
and we also have the inclusion homomorphism
ι : H ,→ K
The prime subfield of a field F is the intersection of all subfields of F , or equivalently the
smallest subfield of F . It can also be constructed by taking the quotient field of the additive
subgroup of F generated by the multiplicative identity 1.
If F has characteristic p where p > 0 is a prime, then the prime subfield of F is isomorphic
to the field Z/pZ of integers mod p. When F has characteristic zero, the prime subfield of
F is isomorphic to the field Q of rational numbers.
765
Chapter 162
Theorem 16. Let L/K be a finite field extension. Then L/K is an algebraic extension.
I n order to prove that L/K is an algebraic extension, we need to show that any element
α ∈ L is algebraic, i.e., there exists a non-zero polynomial p(x) ∈ K[x] such that p(α) = 0.
Recall that L/K is a finite extension of fields, by definition, it means that L is a finite dimensional
vector space over K. Let the dimension be
[L : K] = n
for some n ∈ N.
NOTE: The converse is not true. See the entry “algebraic extension” for details.
766
162.2 algebraic closure
Any two algebraic closures of K are isomorphic as fields, but not necessarily canonically.
Definition 12. Let L/K be an extension of fields. L/K is said to be an algebraic exten-
sion of fields if every element of L is algebraic over K.
Examples:
√
1. Let L = Q( 2). The extension L/Q is an algebraic extension. Indeed, any element
α ∈ L is of the form √
α=q+t 2∈L
for some q, t ∈ Q. Then α ∈ L is a root of
X 2 − 2qX + q 2 − 2t2 = 0
3. Let K be a field and denote by K the algebraic closure of K. Then the extension K/K
is algebraic.
767
162.5 algebraically dependent
Proposition 7. Let K/L be a finite extension of fields and let k ∈ K. There exists a unique
polynomial mk (x) ∈ L[x] such that:
2. mk (k) = 0;
3. If p(x) ∈ L[x] is another polynomial such that p(k) = 0, then mk (x) divides p(x).
ψ : L[x] → K
ψ(p(x)) = p(k)
Note that this map is clearly a ring homomorhism. For all p(x), q(x) ∈ L[x]:
Note that the kernel is a non-zero ideal. This fact relies on the fact that K/L is a finite
extension of fields, and therefore it is an algebraic extension, so every element of K is a root
of a non-zero polynomial p(x) with coefficients in L, this is, p(x) ∈ Ker(ψ).
Moreover, the ring of polynomials L[x] is a principal ideal domain (see example of PID).
Therefore, the kernel of ψ is a principal ideal, generated by some polynomial m(x):
Ker(ψ) = (m(x))
768
Note that the only units in L[x] are the constant polynomials, hence if m0 (x) is another
generator of Ker(ψ) then
Let α be the leading coefficient of m(x). We define mk (x) = α−1 m(x), so that the leading
coefficient of mk is 1. Also note that by the previous remark, mk is the unique generator of
Ker(ψ) which is monic.
Finally, if p(x) is any polynomial such that p(k) = 0, then p(x) ∈ Ker(ψ). Since mk generates
this ideal, we know that mk must divide p(x) (this is property (3)).
For the uniqueness, note that any polynomial satisfying (2) and (3) must be a generator of
Ker(ψ), and, as we pointed out, there is a unique monic generator, namely mk (x).
An important result on finite extensions establishes that any finite extension is also an
algebraic extension.
Let K|L be a finite field extension. Then if κ ∈ K, then the minimal polynomial m(x) ∈ L[x]
is the unique, monic non-zero polynomial such that m(κ) = 0 and any other polynomial
f ∈ L[x] with f (κ) = 0 is divisible by m.
769
162.9 norm
Let K/F be a Galois extension, and let x ∈ K. The norm NK F (x) of x is defined to be
the product of all the elements of the orbit of x under the group action of the Galois group
Gal(K/F ) on K; taken with multiplicities if K/F is a finite extension.
In the case where K/F is a finite extension, the norm of x can be defined to be the
determinant of the linear transformation [x] : K → K given by [x](k) := xk, where K
is regarded as a vector space over F . This definition does not require that K/F be Galois,
or even that K be a field—for instance, it remains valid when K is a division ring (although
F does have to be a field, in order for determinant to be defined). Of course, for finite Galois
extensions K/F , this definition agrees with the previous one, and moreover the formula
Y
NKF (x) := σ(x)
σ∈Gal(K/F )
holds.
The norm of x is always an element of F , since any element of Gal(K/F ) permutes the orbit
of x and thus fixes NK
F (x).
If F is a field of characteristic 0, and a and b are algebraic over F , then there is an element
c in F (a, b) such that F (a, b) = F (c).
Let f ∈ F [x] be a polynomial over a field F . A splitting field for f is a field extension K of
F such that
Theorem: Any polynomial over any field has a splitting field, and any two such splitting
fields are isomorphic. A splitting field is always a normal extension of the ground field.
770
Version: 3 Owner: djao Author(s): djao
Theorem 17. Let L/K be a finite field extension. Then L/K is an algebraic extension.
[ Proof of the Corollary] If the extension was finite, it would be an algebraic extension.
However, the extension R/Q is clearly not algebraic. For example, π ∈ R is transcendental
over Q (see pi).
162.13 trace
Let K/F be a Galois extension, and let x ∈ K. The trace TrK F (x) of x is defined to be
the sum of all the elements of the orbit of x under the group action of the Galois group
Gal(K/F ) on K; taken with multiplicities if K/F is a finite extension.
The trace of x is always an element of F , since any element of Gal(K/F ) permutes the orbit
of x and thus fixes TrK
F (x).
The name ”trace” derives from the fact that, when K/F is finite, the trace of x is simply
the trace of the linear transformation T : K −→ K of vector spaces over F defined by
T (v) := xv.
771
Chapter 163
Let ζn be a primitive nth root of unity. Then Q(ζn )/Q has Galois group (Z/nZ)∗ (the group
of units of Z/nZ) so Q(ζn )/Q is abelian.
Let L/F be a finite dimensional Galois extension of fields, with Galois group G := Gal(L/F ).
There is a bijective, inclusion–reversing correspondence between subgroups of G and extensions
of F contained in L, given by
The extension LH /F is normal if and only if H is a normal subgroup of G, and in this case the
homomorphism G −→ Gal(LH /F ) given by σ 7→ σ|LH induces (via the first isomorphism theorem)
772
a natural identification Gal(LH /F ) = G/H between the Galois group of LH /F and the
quotient group G/H.
Let K be a field, and let L be a separable closure. For any x ∈ K, the Galois conju-
gates of x are the elements of L which are in the orbit of x under the group action of the
absolute Galois group GK on L.
The Galois group Gal(K/F ) of a field extension K/F is the group of all field automorphisms
σ : K −→ K of K which fix F (i.e., σ(x) = x for all x ∈ F ).
773
163.7 absolute Galois group
Let k be a field. The absolute Galois group Gk of k is the Galois group Gal(k sep /k) of the
field extension k sep /k, where k sep is the separable closure of k.
A Galois extension K/F is said to be a cyclic extension if the Galois group Gal(K/F ) is
cyclic.
Let F = Fp (t), where Fp is the field with p elements. The splitting field E of the irreducible
polynomial f = xp − t is not separable over F . Indeed, if α is an element of E such that
αp = t, we have
xp − t = xp − αp = (x − α)p ,
which shows that f has one root of multiplicity p.
Let K/F be a field extension with Galois group G = Gal(K/F ), and let H be a subgroup
of G. The fixed field of H in K is the set
K H := {x ∈ K | σ(x) = x for all σ ∈ H}.
The set K H is always a field, and F ⊂ K H ⊂ K.
774
163.11.1 Topology on the Galois group
Recall that the Galois group G := Γ((|Q)L/F ) of L/F is the group of all field automorphisms
σ : L −→ L that restrict to the identity map on F , under the group operation of composition.
In the case where the extension L/F is infinite dimensional, the group G comes equipped
with a natural topology, which plays a key role in the statement of the Galois correspondence.
We define a subset U of G to be open if, for each σ ∈ U, there exists an intermediate field
K ⊂ L such that
• If σ 0 is another element of G, and the restrictions σ|K and σ 0 |K are equal, then σ 0 ∈ U.
The resulting collection of open sets forms a topology on G, called the Krull topology, and
G is a topological group under the Krull topology.
In this section we exhibit the group G as a projective limit of an inverse system of finite groups.
This construction shows that the Galois group G is actually a profinite group.
Let A denote the set of finite normal extensions K of F which are contained in L. The set
A is a partially ordered set under the inclusion relation. Form the inverse limit
Y
Γ := lim Γ((|Q)K/F ) ⊂ Γ((|Q)K/F )
←−
K∈A
Q
consisting, as usual, of the set of all (σK ) ∈ K Γ((|Q)K/F ) such that σK 0 |K = σK
0 0
for all K, K ∈ A with K ⊂ K . We make Γ into a topological space by putting the
discrete topology on each finite
Q set Γ((|Q)K/F ) and giving Γ the subspace topology induced
by the Qproduct topology on K Γ((|Q)K/F ). The group Γ is a closed subset of the compact
group K Γ((|Q)K/F ), and is therefore compact.
Let Y
φ : G −→ Γ((|Q)K/F )
K∈A
Q
be the group homomorphism which sends an element σ ∈ G to the element (σK ) of K Γ((|Q)K/F )
whose K–th coordinate is the automorphism σ|K ∈ Γ((|Q)K/F ). Then the function φ has
image equal to Γ and in fact is a homeomorphism between G and Γ. Since Γ is profinite, it
follows that G is profinite as well.
775
163.11.3 The Galois correspondence
A field extension K/F is normal if every irreducible polynomial f ∈ F [x] which has at least
one root in K splits (factors into a product of linear factors) in K[x].
776
An extension K/F of finite degree is normal if and only if there exists a polynomial p ∈ F [x]
such that K is the splitting field for p over F .
A perfect field is a field K such that any algebraic extension field L/K is separable over K.
All fields of characteristic 0 are perfect, so in particular the fields R, C and Q are perfect.
F = L0 ⊂ L1 ⊂ · · · ⊂ Ln = L
where for each i, 0 6 i < n, there exists an element αi ∈ Li+1 and a natural number ni such
that Li+1 = Li (αi ) and αini ∈ Li .
A radical extension is a field extension K/F for which there exists a radical tower L/F with
L ⊃ K. The notion of radical extension coincides with the informal concept of solving for
the roots of a polynomial by radicals, in the sense that a polynomial over K is solvable by
radicals if and only if its splitting field is a radical extension of F .
163.16 separable
A field extension K/F is separable if, for each a ∈ K, the minimal polynomial of a over
F is separable. When F has characteristic zero, every extension is separable; examples
of inseparable extensions include the quotient field K(u)[t]/(tp − u) over the field K(u) of
rational functions in one variable, where K has characteristic p > 0.
777
Version: 5 Owner: djao Author(s): djao
Let K be a field and let L be an algebraic closure of K. The separable closure of K inside L
is the compositum of all finite separable extensions of K contained in L (that is to say, the
smallest subfield of L that contains every finite separable extension of K).
778
Chapter 164
The transcendence degree of a set S over a field K, denoted TS , is the size of the maximal
subset S 0 of S such that all the elements of S 0 are algebraically independent.
Heuristically speaking, the transcendece degree of a finite set S is obtained by taking the
number of elements in the set, subtracting the number of algebraic elements in that set, and
then subtracting the number of algebraic relations between distinct pairs of elements in S.
√
Example 4 (Computing the Transcendence
√ Degree). The set S = { 7, π, π 2, e} has TS 6 2
since there are four elements, 7 is algebraic, and the polynomial f (x, y) = x2 − y gives an
algebraic dependence between π and π 2 (i.e. (π, π 2 ) is a root of f ), giving TS 6 4 − 1 − 1 =
2. If we assume the conjecture that e and π are algebraically independent, then no more
dependencies can exist, and we can conclude that, in fact, TS = 2.
779
Chapter 165
12F99 – Miscellaneous
Let {Kα }, α ∈ J, be a collection of subfields of a field L. The composite field of the collection
is the smallest subfield of L that contains all the fields Kα .
The notation K1 K2 (resp., K1 K2 . . . Kn ) is often used to denote the composite field of two
(resp., finitely many) fields.
780
One of the classic theorems on extensions states that if F ⊂ K ⊂ L, then
[L : F ] = [L : K][K : F ]
781
Chapter 166
782
Chapter 167
Let R be an ordered ring and let a ∈ R. The absolute value of a is defined to be the function
| | : R → R given by (
a if a > 0,
|a| :=
−a otherwise.
In particular, the usual absolute value | | on the field R of real numbers is defined in this
manner.
x, y ∈ R are real.
All absolute value functions satisfy the defining properties of a valuation, including:
However, in general they are not literally valuations, because valuations are required to be
real valued. In the case of R and C, the absolute value is a valuation, and it induces a metric
in the usual way, with distance function defined by d(x, y) := |x − y|.
783
Version: 4 Owner: djao Author(s): djao, rmilson
167.2 associates
Let a, b be elements of a ring such that a = bu where u is a unit. Then we say that a and b
are associates.
On an integral domain: a and b are associates if and only if (a) = (b) where (x) denotes the
principal ideal generated by x.
167.4 comaximal
784
Also, recall that
Rad(P) = {r ∈ R | ∃n ∈ N such that r n ∈ P}
Obviously, we have P ⊆ Rad(P) (just take n = 1), so it remains to show the reverse
inclusion.
Suppose r ∈ Rad(P), so there exists some n ∈ N such that r n ∈ P. We want to prove that
r must be an element of the prime ideal P. For this, we use induction on n to prove the
following proposition:
Case n ⇒ Case n + 1: Suppose we have proved the proposition for the case n, so our
induction hypothesis is
∀r ∈ R, r n ∈ P ⇒ r ∈ P
and suppose r n+1 ∈ P. Then
r · r n = r n+1 ∈ P
and since P is a prime ideal we have
r ∈ P or r n ∈ P
Thus we conclude, either directly or using the induction hypothesis, that r ∈ P as desired.
167.6 module
Let R be a ring with identity. A left module M over R is a set with two binary operations,
+ : M × M −→ M and · : R × M −→ M, such that
1. (a + b) + c = a + (b + c) for all a, b, c ∈ M
2. a + b = b + a for all a, b ∈ M
6. 1 · m = m for all m ∈ M
785
8. (r1 + r2 ) · m = (r1 · m) + (r2 · m) for all r1 , r2 ∈ R and m ∈ M
A right module is defined analogously, except that the function · goes from M ×R to M. If R
is commutative, there is an equivalences of category between the category of left R–modules
and the category of right R–modules.
Let R be a commutative ring. For any ideal I of R, the radical of I, written Rad(I), is the
set
{a ∈ R : an ∈ Ifor some integer n > 0}
Every prime ideal is a radical ideal. If I is a radical ideal, the quotient ring R/I is a ring
with no nonzero nilpotent elements.
167.8 ring
Equivalently, a ring is an abelian group (R, +) together with a second binary operation ·
such that · is associative and distributes over +.
786
We say R has a multiplicative identity if there exists an element 1 ∈ R such that a·1 = 1·a = a
for all a ∈ R. We say R is commutative if a · b = b · a for all a, b ∈ R.
Every element a in a ring has a unique additive inverse, denoted −a. The subtraction
operator in a ring is defined by the equation a − b := a + (−b).
167.9 subring
Let (A, +, ∗) a ring. A subring is a subset S of A with the operations + and ∗ of A restricted
to S and such that S is a ring by itself.
Since the restricted operation inherits the associativity, commutativity of +, etc, usually
only closure has to be checked.
Example:
Consider the ring (Z, +, ·). Then (2Z, +, ·) is a subring since the sum or product of two
even numbers is again an even number.
⊗ : A × B → A ⊗ B,
787
module homomorphism
φ : A × B → C,
lifts to a unique R-module homomorphism
φ̃ : A ⊗ B → C,
such that
φ(a, b) = φ̃(a ⊗ b)
for all a ∈ A, b ∈ B. Diagramatically:
⊗
A×B A⊗B
φ ∃! φ̃
The tensor product A ⊗ B can be constructed by taking the free R-module generated by all
formal symbols
a ⊗ b, a ∈ A, b ∈ B,
and quotienting by the obvious bilinear relations:
(a1 + a2 ) ⊗ b = a1 ⊗ b + a2 ⊗ b, a1 , a2 ∈ A, b ∈ B
a ⊗ (b1 + b2 ) = a ⊗ b1 + a ⊗ b2 , a ∈ A, b1 , b2 ∈ B
r(a ⊗ b) = (ra) ⊗ b = a ⊗ (rb) a ∈ A, b ∈ B, r ∈ R
Definition (Categorical). Using the language of categories, all of the above can be ex-
pressed quite simply by stating that for all R-modules M, the functor (−) ⊗ M is left-adjoint
to the functor Hom(M, −).
788
Chapter 168
Let (X, +, ∗) be a ring. Since (X, +) is required to be an abelian group, the operation +
necessarily is commutative. This needs not to happen for ∗. Rings where ∗ is commutative,
that is, x ∗ y = y ∗ x for all x, y ∈ R, are called commutative rings.
789
Chapter 169
Let G be an abelian group. A G-graded ring R is a direct sum R = ⊕g∈G Rg indexed with
the property that Rg Rh ⊆ Rgh .
790
Chapter 170
13A05 – Divisibility
theorem:
Let f be a primitive polynomial over a unique factorization domain R, say
f (x) = a0 + a1 x + a2 x2 + . . . + an xn .
p | am 1≤m≤n
p2 - an
p - a0
then f is irreducible.
proof:
Suppose
f = (b0 + . . . + bs xs )(c0 + . . . + ct xt )
where s > 0 and t > 0. Since a0 = b0 c0 , we know that p divides one but not both of b0 and
c0 ; suppose p | c0 . By hypothesis, not all the cm are divisible by p; let k be the smallest
index such that p - ck . We have ak = b0 ck + b1 ck−1 + . . . + bk c0 . We also have p | k, and p
divides every summand except one on the right side, which yields a contradiction. QED
791
Chapter 171
Let K be an algebraically closed field, and let I be an ideal in K[x1 , . . . , xn ], the polynomial ring
in n indeterminates.
Weak Nullstellensatz:
If V (I) = ∅, then I = K[x1 , . . . , xn ]. In other words, the zero set of any proper ideal of
K[x1 , . . . , xn ] is nonempty.
In the language of algebraic geometry, the latter result is equivalent to the statement that
Rad(I) = I(V (I)).
171.2 nilradical
792
The nilradical of R equals the prime radical of R, although proving that the two are equivalent
requires the axiom of choice.
Given a natural number n, let n = pα1 1 · · · pαk k be its unique factorization as a product of
prime powers. Define the radical of n, denoted rad(n), to be the product p1 · · · pk . This is
the square-free part of the integer, and thus the radical of a square-free number is itself.
793
Chapter 172
Let R 6= 0 be a commutative ring with identity. Is there a maximal ideal in R? This simple
property turns out to be dependent on the axiom of choice. Assuming Zorn’s lemma, which
is equivalent to the axiom of choice, we are able to prove the following:
Proposition 9. Every ring R (as above) has a maximal ideal.
In order to apply Zorn’s lemma we need to prove that every chain in Σ has an upper bound
that belongs to Σ. Let {Aα } be a chain of ideals in Σ, so for all indices α, β we have
Aα ⊆ Aβ or Aβ ⊆ Aα
794
We claim that B, defined by [
B= Aα
α
Aα ⊆ Aβ or Aβ ⊆ Aα
Therefore B ∈ Σ. Hence every chain in Σ has an upper bound in Σ and we can apply Zorn’s
lemma to deduce the existence of M, a maximal element (with respect to inclusion) in Σ.
By definition of the set Σ, M must be a maximal ideal in R.
NOTE: Assuming that the axiom of choice does NOT hold, mathematicians have shown
the existence of commutative rings (with 1) that have no maximal ideals.
Let f : A → B be a ring map. We can look at the ideal generated by the image of a, which
is called an extended ideal and is denoted by ae .
It is not true in general that if a is an ideal in A, the image of a under f will be an ideal in
B. (For example, consider the embedding f : Z → Q. The image of the ideal (2) ⊂ Z is not
an ideal in Q, since the only ideals in Q are {0} and all of Q.)
795
172.4 fractional ideal
172.4.1 Basics
We say that a fractional ideal a is invertible if there exists a fractional ideal a0 such that
a · a0 = A. It can be shown that if a is invertible, then its inverse must be a0 = (A : a), the
annihilator 1 of a in A.
We now suppose that A is a Dedekind domain. In this case, every nonzero fractional ideal
is invertible, and consequently the nonzero fractional ideals in A form a group under ideal
multiplication, called the ideal group of A.
The unique factorization of ideals theorem states that every fractional ideal in A factors
uniquely into a finite product of prime ideals of A and their (fractional ideal) inverses. It
follows that the ideal group of A is freely generated as an abelian group by the nonzero prime
ideals of A.
796
172.5 homogeneous ideal
172.6 ideal
Let R be a ring. A left ideal (resp., right ideal) I of R is a nonempty subset I ⊂ R such that:
• a + b ∈ I for all a, b ∈ I
A 2–sided ideal is a left ideal I which is also a right ideal. If R is a commutative ring, then
these three notions of ideal are equivalent.
Let R be a ring with identity. A proper left (right, two-sided) ideal m ( R is said to be
maximal if m is not a proper subset of any other proper left (right, two-sided) ideal of R.
All maximal ideals are prime ideals. If R is commutative, an ideal m ⊂ R is maximal if and
only if the quotient ring R/m is a field.
797
172.8 principal ideal
Let R be a ring and let a ∈ R. The principal left (resp. right, 2-sided) ideal of a is the
smallest left (resp. right, 2-sided) ideal of R containing the element a.
notation
1: spectrum of ring
2: Spec(R)
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
798
Chapter 173
theorem:
proof:
proof:
799
REFERENCES
[GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurca-
tion Theory (Volume II). Springer-Verlag, New York, 1988.
[SG] Schwarz, W. Gerald: Smooth Functions Invariant Under the Action of aCompact Lie Group,
Topology Vol. 14, pp. 63-68, 1975.
REFERENCES
[GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurca-
tion Theory (Volume II). Springer-Verlag, New York, 1988.
800
Chapter 174
13A99 – Miscellaneous
Proof:
So we get
n
!2 n n n
X X X X
xi yi + (xi yj − xj yi)2 = x2i yi2 + x2i yj2 + x2j yi2 (174.1.2)
i=1 i,j=1,i6=j i=1 i,j=1,i6=j
n
! n
!
X X
= x2i yi2 (174.1.3)
i=1 i=1
801
Note that changing the roles of i and j in xi yj − xj yi, we get
xj yi − xi yj = −(xi yj − xj yi ).
But this doesn’t matter when we square. So we can rewrite the last equation to
n
!2 n
! n !
X X X X
xi yi + 2 (xi yj − xj yi )2 = x2i yi2 . (174.1.4)
i=1 1≤i<j≤n i=1 i=1
174.2 characteristic
The concept of characteristic that exists for integral domains can be generalized for cyclic rings.
By extending the existing definition in this manner, though, characteristics would no longer
have to be 0 or prime.
The characteristic of an infinite cyclic ring is 0. Let R be an infinite cyclic ring and r be
a generator of the additive group of R. If z ∈ Z such that zr = 0R , then z = 0. Since no
positive integer c exists such that cr = 0R , it follows that R has characteristic 0.
A finite ring is cyclic if and only if its order and characteristic are equal. If R is a cyclic
ring and r is a generator of the additive group of R, then |r| = |R|. Since, for every s ∈ R,
|s| divides |R|, then it follows that char R = |R|. Conversely, if R is a finite ring such that
char R = |R|, then the exponent of the additive group of R is also equal to |R|. Thus, there
exists t ∈ R such that |t| = |R|. Since hti is a subgroup of the additive group of R and
|hti| = |t| = |R|, it follows that R is a cyclic ring.
A result of the fundamental theorem of finite abelian groups is that every ring with square-free
order is a cyclic ring.
802
If n is a positive integer, then, up to isomorphism, there are exactly τ (n) cyclic rings of order
n, where τ refers to the tau function. Also, if a cyclic ring has order n, then it has exactly
τ (n) subrings. This result mainly follows from Lagrange’s theorem and its converse. Note
that the converse of Lagrange’s theorem does not hold in general, but it does hold for finite
cyclic groups.
Every subring of a cyclic ring is a cyclic ring. Moreover, every subring of a cyclic ring is an
ideal.
R is a finite cyclic ring of order n if and only if there exists a positive divisor k of n such
that R is isomorphic to kZkn . R is an infinite cyclic ring that has no zero divisors if and
only if there exists a positive integer k such that R is isomorphic to kZ. Finally, R is an
infinite cyclic ring that has zero divisors if and only if it is isomorphic to the following subset
of M2 x 2 (Z):
c −c
c ∈ Z
c −c
Thus, any infinite cyclic ring that has zero divisors is a zero ring.
We group the six squares into 3 groups of two squares and rewrite:
Using
803
we get
X
(xk yi − xi yk )2 = ((x1 y2 − x2 y1 ) +(x3 y4 − x4 y3 ))2(174.4.7)
1≤k<i≤4
by adding equations 174.4.2-174.4.4. We put the result of equation 174.4.7 into 174.4.1 and
get
4
!2
X
xk yk (174.4.9)
k=1
4
! 4
!
X X
= x2k yk2 −((x1 y2 − x2 y1 + x3 y4 − x4 y3 )2
k=1 k=1
−(x1 y3 − x3 y1 + x4 y2 − x2 y4 )2 −(x1 y4 − x4 y1 + x2 y3 − x3 y2 )2
Let R be a cyclic ring and S be a subring of R. Then the additive group of S is a subgroup
of the additive group of R. By definition of cyclic group, the additive group of R is cyclic.
Thus, the additive group of S is cyclic. It follows that S is a cyclic ring.
Let R be a cyclic ring and S be a subring of R. Then R and S are both cyclic rings. Let r
be a generator of the additive group of R and s be a generator of the additive group of S.
Since s ∈ S and S is a subring of R, then s ∈ R. Thus, there exists z ∈ Z with s = zr.
804
Let t ∈ R and u ∈ S. Since u ∈ S and S is a subring of R, then u ∈ R. Since multiplication
is commutative in a cyclic ring, then tu = ut. Since t ∈ R, then there exists a ∈ Z with
t = ar. Since u ∈ S, then there exists b ∈ Z with u = bs.
Since R is a ring, then r 2 ∈ R. Thus, there exists k ∈ Z with r 2 = kr. Since tu = (ar)(bs) =
(ar)[b(zr)] = (abz)r 2 = (abz)(kr) = (abkz)r = (abk)(zr) = (abk)s ∈ S, it follows that S is
an ideal of R.
A ring is a zero ring if the product of any two elements is the additive identity (or zero).
Zero rings are commutative under multiplication. For if Z is a zero ring, 0Z is its additive
identity, and x, y ∈ Z, then xy = 0Z = yx.
Every zero ring is a nilpotent ring. For if Z is a zero ring, then Z 2 = {0Z }.
Since every subring of a ring must contain its zero element, then every subring of a ring is
an ideal, and a zero ring has no proper prime ideals.
Zero rings exist in abundance. They can be constructed from any ring. If R is a ring, then
r −r
r ∈ R
r −r
Every finite zero ring can be written as a direct product of cyclic rings, which must also be
zero rings themselves. This is proven from the fundamental theorem of finite abelian Qm agroups.
Thus, if p1 , . . . , pm are distinct primes, aQ
1 , . . . , am are positive integers, and n = i=1 pi , then
i
m
the number of zero rings of order n is i=1 P (ai), where P denotes the partition function.
805
Chapter 175
175.1 algebraic
an xn + an−1 xn−1 + · · · + a1 x + a0 = 0.
175.2 module-finite
806
Chapter 176
176.1 algebraic
If there is a nonzero polynomial f ∈ F [x] such that f (a) = 0 (in K) we say that a is
algebraic over F .
√
For example, 2 ∈ R is algebraic over Q since √ there is a nonzero polynomial with rational
2
coefficients, namely f (x) = x − 2, such that f ( 2) = 0.
807
Chapter 177
177.1 integral
Let B be a ring with a subring A. An element x ∈ B is integral over A if there exist elements
a1 , . . . , an−1 ∈ A such that
xn + an−1 xn−1 + · · · + a1 x + a0 = 0.
808
Chapter 178
Let B be a ring with a subring A. The integral closure of A in B is the set A0 ⊂ B consisting
of all elements of B which are integral over A.
It is a theorem that the integral closure of A in B is itself a ring. In the special case where
A = Z, the integral closure A0 of Z is often called the ring of integers in B.
809
Chapter 179
Given an integral domain R, the fraction field of R is the localization S −1 R of R with respect
to the multiplicative set S = R \ {0}. It is always a field.
179.2 localization
• (a, s) · (b, t) = (a · b, s · t)
The equivalence class of (a, s) in S −1 R is usually denoted a/s. For a ∈ R, the localization of
R at the minimal multiplicative set containing a is written as Ra . When S is the complement
of a prime ideal p in R, the localization of R at S is written Rp.
810
179.3 multiplicative set
811
Chapter 180
Clearly from the definition, Zn is free as a Z-module for any positive integer n.
F irst note that any two elements in Q are Z-linearly dependent. If x = pq11 and y = pq22 ,
then q1 p2 x − q2 p1 y = 0. Since basis elements must be linearly independent, this shows that
any basis must consist of only one element, say pq , with p and q relatively prime, and without
loss of generality, q > 0. The Z-span of { pq } is the set of rational numbers of the form npq
. I
1 np 1
claim that q+1 is not in the set. If it were, then we would have q = q+1 for some n, but this
q
implies that np = q+1 which has no solutions for n, p ∈ Z ,q ∈ Z+ , giving us a contradiction.
812
Chapter 181
Let R be a principal ideal domain. We call an element m of the R-module a torsion element
if there exists a non-zero α ∈ R such that α · m = 0. The set is denoted by tor(M).
tor(M) is not empty since 0 ∈ tor(M). Let m, n ∈ tor(M), so there exist α, β 6= 0 ∈ R such
that 0 = α · m = β · n. Since αβ · (m − n) = β · α · m − α · β · n = 0, αβ 6= 0, this implies
that m − n ∈ M. So tor(M) are a subgroup of M. Clearly τ · m ∈ tor(M) for any non-zero
τ ∈ R. This shows that tor(M) is a submodule of M, the torsion submodule of M.
813
Chapter 182
Let R be a Noetherian ring, and P be a prime ideal minimal over a principal ideal (x). Then
the height of P , that is, the dimension of RP , is less than 1. More generally, if P is a minimal
prime of an ideal generated by n elements, the height of P is less than n.
814
Chapter 183
13C99 – Miscellaneous
Let R be a commutative ring with 1. Let M be a finitely generated R-module. If there exists
an ideal a of R contained in the Jacobson radical and such that aM = M, then M = 0.
Let R be a ring. A two-sided proper ideal p of a ring R is called a prime ideal if the following
equivalent conditions are met:
1. If I and J are left ideals and the product of ideals IJ satisfies IJ ⊂ p, then I ⊂ p or
J ⊂ p.
815
2. If I and J are right ideals with IJ ⊂ p, then I ⊂ p or J ⊂ p.
When R is commutative with identity, an ideal p of R is prime if and only if for any a, b ∈ R,
if a · b ∈ p then either a ∈ p or b ∈ p.
One also has in this case that an ideal p ⊂ R is prime if and only if the quotient ring R/p
is an integral domain.
Let X = {x1 , x2 , . . . , xn } be a minimal set of generators for M, in the sense that M is not
generated by any proper subset of X.
P
Elements of aM can be written as linear combinations ai xi , where ai ∈ a.
Suppose that |X| > 0. Since M = aM, we can express x1 as a such a linear combination:
X
x1 = ai xi .
816
183.5 proof of Nakayama’s lemma
If M were not zero, it would have a simple quotient, isomorphic to R/m for some maximal ideal
m of R. Then we would have mM 6= M, so that aM 6= M as a ⊆ m.
REFERENCES
1. Serre, J.-P. Local Algebra. Springer-Verlag, 2000.
183.6 support
The support Supp(M) of a module M over a ring R is the set of all prime ideals p ⊂ R such
that the localization Mp is nonzero.
The maximal support Suppm (M) of a module M over a ring R is the set of all maximal ideals
m ⊂ R such that Mm is nonzero.
817
Chapter 184
Let R be a right (left) Noetherian ring. Then R[x] is also right (left) noetherian.
818
184.3 proof of Hilbert basis theorem
Let R be a Noetherian ring and let f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 ∈ R[x] with
an 6= 0. Then call an the initial coefficient of f .
Let I be an ideal in R[x]. We will show I is finitely generated, so that R[x] is noetherian. Now
let f0 be a polynomial of least degree in I, and if f0 , f1 , . . . , fk have been chosen then choose
fk+1 from I r(f0 , f1 , . . . , fk ) of minimal degree. Continuing inductively gives a sequence (fk )
of elements of I.
Let ak be the initial coefficient of fk , and consider the ideal J = (a1 , a2 , a3 , . . .) of initial
coefficients. Since R is Noetherian, J = (a0 , . . . , aN ) for some N.
P
Then I = (f0 , f1 , . . . , fN ). For if not then fN +1 ∈ I r (f0 , f1 , . . . , fn ), and aN +1 = Nk=0 uk ak
PN νk
for some u1 , u2 , . . . , uN ∈ R. Let g(x) = k=0 uk fk x where νk = deg(fN +1 ) − deg(fk ).
Lemma 1. Lemma Let M be a submodule of the R-module Rn . Then M is free and finitely
generated by s ≤ n elements.
Now let x ∈ M such that f (x) = gh and y ∈ M with T f (y) = g. Then f (x − hy) =
f (x) − f (hy) = 0, which is equivalent to x − hy ∈ ker(f ) Rn := N which is isomorphic to
a submodule of Rn−1 . This shows that Rx + N = M.
819
Let {g1 , . . . , gs } be a basis of N. By Passumption, s ≤ n − 1. We’ll show that {x, g1 , . . . , gs }
is linearly independent. So let rx + si=1 ri gi = 0. The first component of the gi are 0, so
the first component of rx must also be 0. Since f (x) is a multiple of g 6= 0 and 0 = r · f (x),
then r = 0. Since {g1 , . . . , gs } are linearly independent, {x, g1 , . . . , gs } is a generating set of
M with s + 1 ≤ n elements.
P
L et {g1 , . . . , gs be a generating set of M and f : Rs 7→ M, (r1 , . . . , rs ) 7→ si=1 ri gi . Then
0
the inverse image N of N is a submodule of Rs and according to lemma 1 can be generated
0
by s or fewer elements. Let n1 , . . . , nt be a generating set of n ; then t ≤ s, and since f is
surjective, f (n1 ), . . . , f (nt ) is a generating set of N.
(II) Let tor(M) be a proper submodule of M. Then there exists a finitely generated free
submodule F of M such that M = F ⊕ tor(M).
Proof of (I): For short set T := tor(M). For m ∈ M m denotes the coset modulo T generated
by m. Let m be a torsion element of M/T , so there exists α ∈ R \ {0} such that α · m = 0,
which means α · m ⊆ T . But then α · m is a member of T , and this implies that M/T has
no non-zero torsion elements (which is obvious if M = tor(M)).
Now let N be a finitely generated torsion-free R-modulePn with generating set {g1 , . . . , gn }.
n
The homomorphism f : R 7→ N, (r1 , . . . , rn ) 7→ i=1 ri gi is injective since N is torsion-free.
Let for instance {e1 , . . . , en } be the standard basis of Rn . Then the elements f (e1 ), . . . , f (en )
are linearly independent in N. Now let N = M/tor(M) where tor(M) = {0}. Then the
cosets can be identified with the elements of M, and the statement follows.
820
Chapter 185
Any Euclidean domain is also a principal ideal domain and therefore also a unique factorization domain.
But even more important, on Euclidean domains we can define gcd and use Euclid’s algorithm.
Examples of Euclidean domains are the rings Z and the polynomial ring on one variable F [x]
where F is a field.
• For any a, b ∈ D, b 6= 0, there exist q, r ∈ D such that a = bq + r with ν(r) < ν(b) or
r = 0.
821
Euclidean valuations are important because they let us define greatest common divisors and
use Euclid’s algorithm. Some facts about Euclidean valuations:
• The value ν(1) is minimal. That is, ν(1) 6 ν(a) for any nonzero element of D.
Let D be an integral domain with an Euclidean valuation. Let a, b ∈ D not both 0. Let
(a, b) = {ax + by|x, y ∈ D}. (a, b) is an ideal in D 6= {0}. We choose d ∈ (a, b) such that
µ(d) is the smallest positive value. Then (a, b) is generated by d and has the property d|a
and d|b. Two elements x and y in D are associate if and only if µ(x) = µ(y). So d is unique
up to a unit in D. Hence d is the greatest common divisor of a and b.
x = yd + r
822
Chapter 186
Let A 6= 0 be a m × n-matrix with entries from a principal ideal domain R. For a ∈ R \ {0}
δ(a) denotes the number of prime factors of a. Start with t = 1 and choose jt to be the
smallest column index of A with a non-zero entry.
(II) If there is an entry at position (k, jt ) such that at,jt 6 |ak,jt , then set β = gcd (at,jt , ak,jt )
and choose σ, τ ∈ R such that
σ · at,jt − τ · ak,jt = 1.
(III) Finally, adding appropriate multiples of row t, it can be achieved that all entries in
column jt except for that at position (t, jt ) are zero. This can be achieved by left-
multiplication with an appropriate matrix.
Applying the steps described above to the remaining non-zero columns of the resulting matrix
(if any), we get an m × n-matrix with column indices j1 , . . . , jr where r ≤ min(m, n), each
of which satisfies the following:
823
2. all entries below and above position (l, jl ) as well as entries left of (l, jl ) are zero.
This is a version of the Gauss algorithm for principal ideal domains which is usually described
only for commutative fields.
Now we can re-order the columns of this matrix so that elements on positions (i, i) for
1 ≤ i ≤ r are nonzero and δ(aii ≤ δ(ai+1,i+1 ) for 1 ≤ i < r; and all columns right of the r-th
column (if present) are zero. For short set αi for the element at position (i, i). δ has non-
negative integer values; so δ(α1 ) = 0 is equivalent to α1 being a unit of R. δ(αi ) = δ(αi+1 )
can either happen if αi and αi+1 differ by a unit factor, or if they are relative prime. In the
latter case one can add column i + 1 to column i (which doesn’t change αi ) and then apply
appropriate row manipulations to get αi = 1. And for δ(αi ) < δ(αi+1 ) and αi 6 |αi+1 one can
apply step (II) after adding column i + 1 to column i. This diminishes the minimum δ-values
for non-zero entries of the matrix, and by reordering columns etc. we end up with a matrix
whose diagonalelements αi satisfy αi |αi+1 ∀ 1 ≤ i < r.
Since all row and column manipulations involved in the process are invertible, this shows
that there exist invertible m × m and n × n-matrices s, T so that S · A · T is
α1 0 ... 0
0 α2 0 ... 0
..
. (186.1.1)
0 . . . 0 αr 0
0 ...0
This is the Smith normalform of the matrix. The elements αi are unique up to associat-
edness and are called elementary divisors.
824
Chapter 187
Formal power series allow one to employ much of the analytical machinery of power series
in settings which don’t have natural notions of convergence. They are also useful in order to
compactly describe sequences and to find closed formulas for recursively described sequences;
this is known as the method of generating functions and will be illustrated below.
We start with a commutative ring R. We want to define the ring of formal power series over
R in the variable X, denoted by R[[X]];
P∞ each element of this ring can be written in a unique
n
way as an infinite sum of the form n=0 an X , where the coefficients an are elements of R;
any choice of coefficients an is allowed. R[[X]] is actually a topological ring so that these
infinite sums are well-defined and convergent. The addition and multiplication of such sums
follows the usual laws of power series.
Formal construction Start with the set RN of all infinite sequences in R. Define addition
of two such sequences by
(an ) + (bn ) = (an + bn )
and multiplication by
Xn
(an )(bn ) = ( ak bn−k ).
k=0
This turns RN into a commutative ring with multiplicative identity (1,0,0,. . . ). We identify
the element a of R with the sequence (a,0,0,. . . ) and define X := (0, 1, 0, 0, . . .). Then every
element of RN of the form (a0 , a1 , a2 , . . . , aN , 0, 0, . . .) can be written as the finite sum
N
X
an X n .
n=0
825
In order to extend this equation to infinite series, we need a metric on RN . We define
d((an ), (bn )) = 2−k , where k is the smallest natural number such that ak 6= bk (if there is not
such k, then the two sequences are equal and we define their distance to be zero). This is a
metric which turns RN into a topological ring, and the equation
X∞
(an ) = an X n
n=0
can now be rigorously proven using the notion of convergence arising from d; in fact, any
rearrangement of the series converges to the same limit.
This topological ring is the ring of formal power series over R and is denoted by R[[X]].
Properties R[[X]] is an associative algebra over R which contains the ring R[X] of polynomials
over R; the polynomials correspond to the sequences which end in zeros.
The metric space (R[[X]], d) is complete. The topology on R[[X]] is equal to the product topology
on RN where R is equipped with the discrete topology. It follows from Tychonoff’s theorem
that R[[X]] is compact if and only if R is finite. The topology on R[[X]] can also be seen as
the I-adic topology, where I = (X) is the ideal generated by X (whose elements are precisely
the formal power series with zero constant coefficient).
If K = R is a field, we can consider the quotient field of the integral domain K[[X]]; it is
denoted by K((X)). It is a topological field whose elements are called formal Laurent
series; they can be uniquely written in the form
X∞
f= an X n
n=−M
826
Formal power series as functions In analysis, every convergent power series defines
a function with values in the real or complex numbers. Formal power series can also be
interpreted
P as functions, but one has to be careful with the domain and codomain. If
f = an X n is an element of R[[X]], if S is a commutative associative algebra over R, if I
an ideal in S such that the I-adic topology on S is complete, and if x is an element of I,
then we can define ∞
X
f (x) := an xn .
n=0
This latter series is guaranteed to converge in S given the above assumptions. Furthermore,
we have
(f + g)(x) = f (x) + g(x)
and
(f g)(x) = f (x)g(x)
(unlike in the case of bona fide functions, these formulas are not definitions but have to
proved).
Since the topology on R[[X]] is the (X)-adic topology and R[[X]] is complete, we can in
particular apply power series to other power series, provided that the arguments don’t have
constant coefficients: f (0), f (X 2 − X) and f ((1 − X)−1 − 1) are all well-defined for any
formal power series f ∈ R[[X]].
With this formalism, we can give an explicit formula for the multiplicative inverse of a power
series f whose constant coefficient a = f (0) is invertible in R:
∞
X
−1
f = a−n−1 (a − f )n
n=0
P
Differentiating formal power series If f = ∞ n
n=0 an X ∈ R[[X]], we define the formal
derivative of f as
X∞
Df = an nX n−1 .
n=1
D(f · g) = (D f ) · g + f · (D g)
(Dk f )(0) = k! ak
827
(here k! denotes the element 1 × (1 + 1) × (1 + 1 + 1) × . . . ∈ R.
One can also define differentiation for formal Laurent series in a natural way, and then the
quotient rule, in addition to the rules listed above, will also be valid.
Power series in several variables The fastest way to define the ring R[[X1 , . . . , Xr ]]
of formal power series over R in r variables starts with the ring S = R[X1 , . . . , Xr ] of
polynomials over R. Let I be the ideal in S generated by X1 , . . . , Xr , consider the I-adic
topology on S, and form its completion. This results in a complete topological ring containing
S which is denoted by R[[X1 , . . . , Xr ]].
where the sum extends over all n ∈ Nr . These sums converge for any choice of the coefficients
an ∈ R and the order in which the summation is carried out does not matter.
Since R[[X1 ]] is a commutative ring, we can define its power series ring, say R[[X1 ]][[X2 ]].
This ring is naturally isomorphic to the ring R[[X1 , X2 ]] just defined, but as topological rings
the two are different.
Similar to the situation described above, we can “apply” power series in several variables to
other power series with zero constant coefficients. It is also possible to define partial derivatives
for formal power series in a straightforward way. Partial derivatives commute, as they do
for continuously differentiable functions.
Uses One can use formal power series to prove several relations familar from analysis in a
purely algebraic setting. Consider for instance the following elements of Q[[X]]:
X∞
(−1)n
sin(X) := X 2n+1
n=0
(2n + 1)!
∞
X (−1)n
cos(X) := X 2n
n=0
(2n)!
Then one can easily show that
sin2 + cos2 = 1
828
and
D sin = cos
as well as
sin(X + Y ) = sin(X) cos(Y ) + cos(X) sin(Y )
(the latter being valid in the ring Q[[X, Y ]]).
As an example of the method of generating functions, consider the problem of finding a closed
formula for the Fibonacci numbers fn defined by fn+2 = fn+1 + fn , f0 = 0, and f1 = 1. We
work in the ring R[[X]] and define the power series
∞
X
f= fn X n ;
n=0
f is called the generating function for the sequence (fn ). The generating function for
the sequence (fn−1 ) is Xf while that for (fn−2 ) is X 2 f . From the recurrence relation, we
therefore see that the power series Xf +X 2f agrees with f except for the first two coefficients.
Taking these into account, we find that
f = Xf + X 2 f + X
(this is the crucial step; recurrence relations can almost always be translated into equations
for the generating functions). Solving this equation for f , we get
X
f= .
1 − X − X2
√ √
Using the golden ratio φ1 = (1 + 5)/2 and φ2 = (1 − 5)/2, we can write the latter
expression as
1 1 1
√ − .
5 1 − φ1 X 1 − φ2 X
These two power series are known explicitly because they are geometric series; comparing
coefficients, we find the explicit formula
1
fn = √ (φn1 − φn2 ) .
5
In algebra, the ring K[[X1 , . . . , Xr ]] (where K is a field) is often used as the “standard, most
general” complete local ring over K.
Universal property The power series ring R[[X1 , . . . , Xr ]] can be characterized by the
following universal property: if S is a commutative associative algebra over R, if I is an ideal
in S such that the I-adic topology on S is complete, and if x1 , . . . , xr ∈ I are given, then
there exists a unique Φ : R[[X1 , . . . , Xr ]] → S with the following properties:
829
• Φ is an R-algebra homomorphism
• Φ is continuous
• Φ(Xi ) = xi for i = 1, . . . , r.
830
Chapter 188
Note: Discrete valuations are often written additively instead of multiplicatively; under this
alternate viewpoint, the element x maps to logc |x| (in the above notation) instead of just
|x|. This transformation reverses the order of the absolute values (since c < 1), and sends
the element 0 ∈ K to ∞. It has the advantage that every valuation can be normalized by a
suitable scalar multiple to take values in the integers.
A discrete valuation ring R is a principal ideal domain with exactly one maximal ideal M.
Any generator t of M is called a uniformizer or uniformizing element of R; in other words,
831
/ M 2.
a uniformizer of R is an element t ∈ R such that t ∈ M but t ∈
832
Chapter 189
• a ∈ (b)
or
Proof: First, let ν be a Dedekind-Hasse valuation and let I be an ideal of an integral domain
D. Take some b ∈ I with ν(b) minimal (this exists because the integers are well-ordered)
and some a ∈ I such that a 6= 0. I must contain both (a) and (b), and since it is closed
under addition, α + β ∈ I for any α ∈ (a), β ∈ (b).
Since ν(b) is minimal, the second possibility above is ruled out, so it follows that a ∈ (b).
But this holds for any a ∈ I, so I = (b), and therefore every ideal is princple.
For the converse, let D be a PID. Then define ν(u) = 1 for any unit. Any non-zero, non-unit
can be factored into a finite product of irreducibles (since every PID as a UFD), and every
such factorization of a is of the same length, r. So for a ∈ D, a non-zero non-unit, let
ν(a) = r + 1. Obviously r ∈ Z+ .
Then take any a, b ∈ D − {0} and suppose a ∈ / (b). Then take the ideal of elements of
the form {α + β|α ∈ (a), β ∈ (b)}. Since this is a PID, it is a principal ideal (c) for some
r ∈ D − {0}, and since 0 + b = b ∈ (c), there is some non-unit x ∈ D such that xc = b.
Then N(b) = N(xr). But since x is not a unit, the factorization of b must be longer than
the factorization of c, so ν(b) > ν(c), so ν is a Dedekind-Hasse valuation.
833
Version: 1 Owner: Henry Author(s): Henry
189.2 PID
A principal ideal domain D is an integral domain where every ideal is a principal ideal.
In a PID, an ideal (p) is maximal if and only if p is irreducible (and prime since any PID is
also a UFD).
189.3 UFD
• Every nonzero element of D that is not an unit can be factored into a product of a
finite number of irreducibles.
Since R[x, y] ∼
= R[x][y], these results can be extended to rings of polynomials with a finite
number of variables.
The converse is however, non true. Let F a field and consider the UFD F [x, y] Let I the
ideal consisting of all the elements of F [x, y] whose constant term is 0. Then it can be proved
that I is not a principal ideal. Therefore not every UFD is a PID.
834
189.4 a finite integral domain is a field
Suppose ϕ(r) = ϕ(s) for some r, s ∈ R. Then ar = as, which implies a(r − s) = 0. Since
a 6= 0 and R is a cancellation ring, we have r − s = 0. So r = s, and hence ϕ is injective.
Since R is finite and ϕ is injective, by the pigeonhole principle we see that ϕ is also surjective.
Thus there exists some b ∈ R such that ϕ(b) = ab = 1R , and thus a is a unit.
If an R = an+1 R, then there exists r ∈ R such that an = an+1 r, therefore since an 6= 0 (since
R is an integral domain) then we must have 1 = ar. Hence a is a unit.
Both of these examples are actually examples of Euclidean rings, which are always PIDs.
There are, however, more complicated examples of PIDs which are not Euclidean rings.
835
Version: 2 Owner: sleske Author(s): sleske
field of quotients
1: R integral domain
2: A = {a/b a, b ∈ R & b 6= 0}
4: A/∼
1: An irreducible polynomial
2: in F [x]
Note: This is a “seed” entry written using a short-hand format described in this FAQ.
836
189.9 irreducible
UFDs, PIDs, and Euclidean domains are ways of building successively more of standard
number theory into a domain.
First, observe that the units are numbers that are 1-like. Obvious examples, besides 1 itself,
are −1 in Z or i and −i in the complex integers.
Ideals behave something like the set of products of an element; in Z those are the ideals
(together with the zero ideal), and ideals in other rings have some similar behavior.
In commutative rings, prime ideals have one property similar to the prime numbers. Specif-
ically, a product of two elements is in the ideal exactly when one of those elements is already
in the ideal, the way that if a · b is even we know that either a or b is even, but if we know
it is a multiple of four, that could be because both are even but not divisible by four.
The other property most associated with prime numbers is their irreducability: the only way
to factor an irreducible element is to use a unit, and since units are ”1-like,” that doesn’t
really break the element into smaller pieces. (Specifically, the non-unit factor can always be
broken into another unit times the original irreducible element).
In a UFD these two properties of prime numbers coincide for non-zero numbers. All
prime elements (generators of prime ideals) are irreducible and all irreducibles are prime
elements. In addition, all numbers can be factored into prime elements the same way inte-
gers can be factored into primes.
A principal ideal domain behaves even more like the integers by adding the concept of a
greatest common divisor. Formally this holds because for any two ideals, in any ring, we
can find a minimal ideal which contains both of them, and in a PID we have the guarantee
that the new ideal is generated by a particular element–the greatest common divisor. The
Dedekind-Hasse valuation on the ring encodes this property, by requiring that, if a is not a
multiple of b (that is, in (b)) then there is a common divisor which is simpler than b (the
formal definition is that the element be of the form ax + by, but there is, in general, a
connection between linear combinations of elements and their greatest common divisor).
Being a Euclidean domains is an even stronger requirement, the most important effect of
which is to provide Euclid’s algorithm for finding g.c.d’s. The key property is that di-
837
vision with remainders can be performed akin to the way it is done in the integers. A
Euclidean valuation again encodes this property by ensuring that remainders are limited
(specifically, requiring that the norm of the remainder be less than the norm of the divi-
sor). This forces the remainders to get successively smaller, guaranteeing that the process
eventually halts.
Let R be a ring. A nonzero element a ∈ R is called a zero divisor if there exists a nonzero
element b ∈ R such that a · b = 0.
Example: Let R = Z6 . Then the elements 2 and 3 are zero divisors, since 2 · 3 ⇔ 6 ⇔ 0
(mod 6).
838
Chapter 190
A local ring R of dimension n is regular if and only if its maximal ideal m is generated by n
elements.
Equivalently, R is regular if dimR/m m/m2 = dim R, where the first dimension is that of a
vector space, and the latter is the Krull dimension, since by Nakayama’s lemma, elements
generate m if and only if their images under the projection generate m/m2 .
By Krull’s principal ideal theorem, m cannot be generated by fewer than n elements, so the
maximal ideals of regular local rings have a minimal number of generators.
839
Chapter 191
13H99 – Miscellaneous
Commutative case
A commutative ring with multiplicative identity is called local if it has exactly one maximal ideal.
This is the case if and only if 1 6= 0 and the sum of any two non-units in the ring is again a
non-unit; the unique maximal ideal consists precisely of the non-units.
The name comes from the fact that these rings are important in the study of the local
behavior of varieties and manifolds: the ring of function germs at a point is always local.
(The reason is simple: a germ f is invertible in the ring of germs at x if and only if f (x) 6= 0,
which implies that the sum of two non-invertible elements is again non-invertible.) This is
also why schemes, the generalizations of varieties, are defined as certain locally ringed spaces.
Other examples of local rings include:
Every local ring (R, m) is a topological ring in a natural way, taking the powers of m as a
neighborhood base of 0.
840
Given two local rings (R, m) and (S, n), a local ring homomorphism from R to S is a
ring homomorphism f : R → S (respecting the multiplicative identities) with f (m) ⊆ n.
These are precisely the ring homomorphisms that are continuous with respect to the given
topologies on R and S.
The residue field of the local ring (R, m) is the field R/m.
General case
One also considers non-commutative local rings. A ring with multiplicative identity is called
local if it has a unique maximal left ideal. In that case, the ring also has a unique maximal
right ideal, and the two ideals coincide with the ring’s Jacobson radical, which in this case
consists precisely of the non-units in the ring.
A ring R is local if and only if the following condition holds: we have 1 6= 0, and whenever
x ∈ R is not invertible, then 1 − x is invertible.
All skew fields are local rings. More interesting examples are given by endomorphism rings:
a finite-length module over some ring is indecomposable if and only if its endomorphism ring
is local, a consequence of Fitting’s lemma.
841
Chapter 192
192.1 completion
Let (X, d) be a metric space. Let X̄ be the set of all Cauchy sequences {xn }n∈N in X.
Define an equivalence relation ∼ on X̄ by setting {xn } ∼ {yn } if the interleave sequence of
the sequences {xn } and {yn } is also a Cauchy sequence. The completion of X is defined to
be the set X̂ of equivalence classes of X̄ modulo ∼.
where {xn } and {yn } are representative Cauchy sequences of elements in X̂. The definition
of ∼ is tailored so that the limit in the above definition is well defined, and fact that these
sequences are Cauchy, together with the fact that R is complete, ensures that the limit exists.
The space X̂ with this metric is of course a complete metric space.
Note the similarity between the construction of X̂ and the construction of R from Q. The
process used here is the same as that used to construct the real numbers R, except for the
minor detail that one can not use the terminology of metric spaces in the construction of R
itself because it is necessary to construct R in the first place before one can define metric
spaces.
If the metric space X has an algebraic structure, then in many cases this algebraic structure
carries through unchanged to X̂ simply by applying it one element at a time to sequences in
X. We will not attempt to state this principle precisely, but we will mention the following
important instances:
842
1. If (X, ·) is a topological group, then X̂ is also a topological group with multiplication
defined by
{xn } · {yn } = {xn · yn }.
2. If X is a topological ring, then addition and multiplication extend to X̂ and make the
completion into a topological ring.
3. If F is a field with a valuation v, then the completion of F with respect to the metric
imposed by v is a topological field, denoted Fv and called the completion of F at v.
The completion X̂ of X satisfies the following universal property: for every continuous map
f : X −→ Y of X into a complete metric space Y , there exists a unique lifting of f to a
continuous map fˆ : X̂ −→ Y making the diagram
f
X Y
fˆ
X̂
843
Chapter 193
An ordered ring is a commutative ring R with an ordering relation 6 such that, for every
a, b, c ∈ R:
1. If a 6 b, then a + c 6 b + c
2. If a 6 b and 0 6 c, then c · a 6 c · b
844
Chapter 194
13J99 – Miscellaneous
A ring R which is a topological space is called a topological ring if the addition and multi-
plication functions are continuous functions from R × R to R.
845
Chapter 195
13N15 – Derivations
195.1 derivation
• d(x + y) = dx + dy
• d(x · y) = x · dy + dx · y
846
Chapter 196
Henceforth, assume that we have fixed a monomial ordering. Take a ∈ F [x1 , . . . , xn ]. Define
the support of a, denoted supp(a), to be the set of monomials of a with nonzero coefficients.
Then define M(a) = max(supp(a)).
847
Let (f1 , . . . , fs ) be an ordered s-tuple of polynomials, with fi ∈ F [x1 , . . . , xn ]. Then for each
f ∈ F [x1 , . . . , xn ], there exist a1 , . . . , as , r ∈ F [x1 , . . . , xn ] with r unique, such that
1. f = a1 f1 + · · · + as fs + r
848
Chapter 197
The Picard group of a variety, scheme, or more generally locally ringed space (X, OX ) is the
group of locally free OX modules of rank 1 with tensor product over OX as the operation.
Affine space of dimension n over a field k is simply the set of ordered n-tuples k n .
An affine variety over an algebraically closed field k is a subset of some affine space k n over
k which can be described as the vanishing set of finitely many polynomials in n variables
with coefficients in k, and which cannot be written as the union of two smaller such sets.
849
For example, the locus described by Y − X 2 = 0 as a subset of C2 is an affine variety over
the complex numbers. But the locus described by Y X = 0 is not (as it is the union of the
loci X = 0 and Y = 0).
One can define a subset of affine space k n or an affine variety in k n to be closed if it is a subset
defined by the vanishing set of finitely many polynomials in n variables with coefficients in k.
The closed subsets then actually satisfy the requirements for closed sets in a topology, so this
defines a topology on the affine variety known as the Zariski topology. The definition above
has the extra condition that an affine variety not be the union of two closed subsets, i.e. it is
required to an irreducible topological space in the Zariski topology. Anything then satisfying
the definition without possibly the irreducibility is known as an (affine) algebraic set.
A quasi-affine variety is then an open set (in the Zariski topology) of an affine variety.
Note that some geometers do not require what they call a variety to be irreducible, that is
they call algebraic sets varieties. The most prevalent definition however requires varieties to
be irreducible.
Let E and E 0 be elliptic curves over a field K of characteristic 6= 2, 3,and let [m] denote the
multiplcation-by-m isogeny on E. Then there exists a unique isogeny fˆ : E 0 → E, called the
dual isogeny to f , such that fˆ ◦ f = [m].
Often only the existence of a dual isogeny is needed, but the construction is explicit via the
exact sequence
φ∗
E 0 → Div 0 (E 0 ) → Div 0 (E) → E,
A finite morphism of affine schemes f : Spec A → Spec B is a morphism with the property
that the associated homomorphism of rings f ∗ : B → A makes A into a finite B-algebra.
850
Likewise, a finite morphism of affine varieties f : V → W is a morphism with the property
that the associated homomorphism of rings f ∗ : A(W ) → A(V ) makes A(V ) into a finite
A(W )-module, where A(V ) denotes the coordinate ring of V .
197.6 isogeny
Let E and E 0 be elliptic curves over a field k. An isogeny between E and E 0 is a finite morphism
f : E → E 0 that preserves basepoints.
The two curves are called isogenous if there is an isogeny between them. This is an
equivalence relation, symmetry being due to the existence of the dual isogeny. Every isogeny
is an algebraic homomorphism and thus induces homomorphisms of the groups of the elliptic
curves for k-valued points.
In algebraic geometry, the term line bundle refers to a locally free coherent sheaf of rank
1, also called an invertible sheaf. In manifold theory, it refers to a real or complex one
dimensional vector bundle. These notions are equivalent on a non-singular complex algebraic
variety X: given a one dimensional vector bundle, its sheaf of holomorphic sections is locally
free and of rank 1. Similarly, given a locally free sheaf F of rank one, the space
[
L= Fx /mx Fx ,
x∈X
given the coarsest topology for which sections of F define continuous functions in a vector
bundle of complex dimension 1 over X, with the obvious map taking the stalk over a point
to that point.
851
Version: 2 Owner: bwebste Author(s): bwebste
A variety over an algebraically closed field k is nonsingular at a point x if the local ring Ox is
a regular local ring. Equivalently, if around the point, one has an open affine neighborhood
wherein the variety is cut out by certain polynomials F1 , . . . Fn of m variables x1 , . . . , xm ,
then it is nonsingular at x if the Jacobian has maximal rank at that point. Otherwise, x is
a singular point.
Every x = (x0 , . . . , xn ) ∈ Kn+1 \{0} determines an element of projective space, namely the
line passing through x. Formally, this line is the equivalence class [x], or [x0 : x1 : . . . : xn ], as
it is commonly denoted. The numbers x0 , . . . , xn are referred to as homogeneous coordinates
of the line. Homogeneous coordinates differ from ordinary coordinate systems in that a given
element of projective space is labelled by multiple homogeneous “coordinates”.
Affine coordinates. Projective space also admits a more conventional type of coordinate
system, called affine coordinates. Let A0 ⊂ KPn be the subset of all elements p = [x0 : x1 :
. . . : xn ] ∈ KPn such that x0 6= 0. We then define the functions
Xi : A0 → Kn , i = 1, . . . , n,
852
according to
xi
Xi (p) = ,
x0
where (x0 , x1 , . . . , xn ) is any element of the equivalence class representing p. This definition
makes sense because other elements of the same equivalence class have the form
H0 = {x0 = 1} ⊂ Kn+1 .
Geometrically, affine coordinates can be described by saying that the elements of A0 are
lines in Kn+1 that are not parallel to H0 , and that every such line intersects H0 in one and
exactly one point. Conversely points of H0 are represented by tuples (1, x1 , . . . , xn ) with
(x1 , . . . , xn ) ∈ Kn , and each such point uniquely labels a line [1 : x1 : . . . : xn ] in A0 .
It must be noted that a single system of affine coordinates does not cover all of projective
space. However, it is possible to define a system of affine coordinates relative to every
hyperplane in Kn+1 that does not contain the origin. In particular, we get n + 1 different
systems of affine coordinates corresponding to the hyperplanes {xi = 1}, i = 0, 1, . . . , n.
Every element of projective space is covered by at least one of these n + 1 systems of
coordinates.
It is evident that for every non-zero λ ∈ K the transformation λA gives the same projective
automorphism as A. For this reason, it is convenient to identify the group of projective
automorphisms with
PSLn (K) = SLn+1 (K)/Ωn+1 .
Here SLn+1 denotes the “special” group of unimodular linear transformations, that is trans-
formations of Kn+1 having determinant 1. The symbol Ωn+1 denotes the subgroup generated
by elements ωI, where ω is a (n + 1)st root of unity. The unimodular conditition is almost
sufficient to uniquely specify a linear transformation to represent a given projective action.
However, note that
853
and hence if the field K admits non-trivial roots of unity ω, then multiplication by such an
ω preserves the determinant. Hence, the projective action of A ∈ SLn+1 coincides with the
projective action of ωA, making it necessary to quotient SLn+1 by the normal subgroup Ωn+1
in order to obtain the group of projective automorphisms.
A projective variety over an algebraically closed field k is a subset of some projective space Pnk
over k which can be described as the common vanishing locus of finitely many homogeneous
polynomials with coefficients in k, and which is not the union of two such smaller loci.
854
Chapter 198
Let Ank denote the affine space k n over a field k. The Zariski topology on Ank is defined to be
the topology whose closed sets are the sets
where I ⊂ k[X1 , . . . , Xn ] is any ideal in the polynomial ring k[X1 , . . . , Xn ]. For any affine variety
V ⊂ Ank , the Zariski topology on V is defined to be the subspace topology induced on V as
a subset of Ank .
Let Pnk denote n–dimensional projective space over k. The Zariski topology on Pnk is defined
to be the topology whose closed sets are the sets
The Zariski topology is the predominant topology used in the study of algebraic geometry.
Every regular morphism of varieties is continuous in the Zariski topology (but not every
continuous map in the Zariski topology is a regular morphism). In fact, the Zariski topology
is the weakest topology on varieties making points in A1k closed and regular morphisms
continuous.
855
198.2 algebraic map
Alternatively, f is algebraic if the pullback map f ∗ : C(Y ) → C(X) takes the coordinate
ring of Y , k[Y ], to the coordinate ring of X, k[X].
Suppose k is an algebraically closed field. Let Ank denote affine n-space over k.
We say that Y ⊆ Ank is an algebraic set if there exists T ⊆ k[x1 , . . . , xn ] such that Y = V (T ).
The subsets of Ank which are algebraic sets induces the Zariski topology over Ank .
Thus we have defined a function Z mapping from subsets of k[x1 , . . . , xn ] to algebraic sets
in Ank , and a function I mapping from subsets of An to ideals of k[x1 , . . . , xn ].
856
From the above, we see that there is a 1-1 correspondence between algebraic sets in Ank and
radical ideals of k[x1 , . . . , xn ]. Furthermore, an algebraic set Y ⊆ Ank is an affine variety if
and only if I(Y ) is a prime ideal.
A topological space X is called noetherian if it satisfies the descending chain condition for
closed subsets: for any sequence
Y1 ⊇ Y2 ⊇ · · ·
of closed subsets Yi of X, there is an integer m such that Ym = Ym+1 = · · · .
Example:
The space Ank (affine n-space over a field k) under the Zariski topology is an example of a
noetherian topological space. By properties of the ideal of a subset of Ank , we know that if
Y1 ⊇ Y2 ⊇ · · · is a descending chain of Zariski-closed subsets, then I(Y1) ⊆ I(Y2 ) ⊆ · · · is
an ascending chain of ideals of k[x1 , . . . , xn ].
Since k[x1 , . . . , xn ] is a Noetherian ring, there exists an integer m such that I(Ym ) = I(Ym+1 ) =
· · · . But because we have a one-to-one correspondence between radical ideals of k[x1 , . . . , xn ]
and Zariski-closed sets in Ank , we have V (I(Yi )) = Yi for all i. Hence Ym = Ym+1 = · · · as
required.
A regular map φ : k n → k m between affine spaces over an algebraically closed field is merely
one given by polynomials. That is, there are m polynomials F1 , . . . , Fm in n variables such
that the map is given by φ(x1 , . . . , xn ) = (F1 (x), . . . , Fm (x)) where x stands for the many
components xi .
A regular map φ : V → W between affine varieties is one which is the restriction of a regular
map between affine spaces. That is, if V ⊂ k n and W ⊂ k m , then there is a regular map
ψ : k n → k m with ψ(V ) ⊂ W and φ = ψ|V . So, this is a map given by polynomials, whose
image lies in the intended target.
857
Version: 2 Owner: nerdy2 Author(s): nerdy2
Let X be an irreducible algebraic variety over a field k, together with the Zariski topology.
Fix a point x ∈ X and let U ⊂ X be any affine open subset of X containing x. Define
where k[U] is the coordinate ring of U and k(U) is the fraction field of k[U]. The ring ox is
independent of the choice of affine open neighborhood U of x.
The structure sheaf on the variety X is the sheaf of rings whose sections on any open subset
U ⊂ X are given by \
OX (U) := ox ,
x∈U
and where the restriction map for V ⊂ U is the inclusion map OX (U) ,→ OX (V ).
There is an equivalences of category under which an affine variety X with its structure sheaf
corresponds to the prime spectrum of the coordinate ring k[X]. In fact, the topological
embedding X ,→ Spec(k[X]) gives rise to a lattice–preserving bijection 1 between the open
sets of X and of Spec(k[X]), and the sections of the structure sheaf on X are isomorphic to
the sections of the sheaf Spec(k[X]).
1
Those who are fans of topos theory will recognize this map as an isomorphism of topos.
858
Chapter 199
Let R be a ring, and X = Spec R be the its prime spectrum. Given an R-module M, one
can define a presheaf on X by defining its sections on an open set U to be OX (U) ⊗R M. We
call the sheafification of this M̃, and a sheaf of this form on X is called quasi-coherent. If M
is a finitely generated module, then M̃ is called coherent. A sheaf on an arbitrary scheme
X is called (quasi-)coherent if it is (quasi-)coherent on each open affine subset of X.
859
199.3 fibre product
y
X ×S Y p X
q i
Y j
S
commute. In other words, a fiber product is an object X ×S Y , together with morphisms
p, q making the diagram commute, with the universal property that any other collection
(Z, x, y) forming such a commutative diagram maps into (X ×S Y, p, q).
Fibre products of schemes always exist and are unique up to canonical isomorphism.
Other notes Fibre products are also called pullbacks and can be defined in any category
using the same definition (but need not exist in general). For example, they always exist in
the category of modules over a fixed ring, as well as in the category of groups.
Let R be any commutative ring with identity. The prime spectrum Spec(R) of R is defined
to be the set
{P ( Rsuch thatP is a prime ideal of R}.
860
For any subset A of R, we define the variety of A to be the set
It is enough to restrict attention to subsets of R which are ideals, since, for any subset A of
R, we have√V (A) = V√ (I) where I is the ideal generated by A. In fact, even more is true:
V (I) = V ( I) where I denotes the radical of the ideal I.
We impose a topology on Spec(R) by defining the sets V (A) to be the collection of closed
subsets of Spec(R) (that is, a subset of Spec(R) is open if and only if it equals the complement
of V (A) for some subset A). The equations
!
\ [
V (Iα ) = V Iα
α α
n n
!
[ \
V (Ii ) = V Ii ,
i=1 i=1
for any ideals Iα , Ii of R, establish that this collection does constitute a topology on Spec(R).
This topology is called the Zariski topology in light of its relationship to the Zariski topology
on an algebraic variety (see section 199.4.4 below). Note that a point P ∈ Spec(R) is closed
if and only if P ⊂ R is a maximal ideal.
for any element f ∈ R. The collection of distinguished open sets forms a topological basis
for the open sets of Spec(R). In fact, we have
[
Spec(R) \ V (A) = Spec(R)f .
f ∈A
• A subset of Spec(R) is an irreducible closed set if and only if it equals V (P ) for some
prime ideal P of R.
861
• For P ∈ Spec(R), let RP denote the localization of R at the prime ideal P . Then the
topological spaces V (P ) ⊂ Spec(R) and Spec(RP ) are naturally homeomorphic, via
the correspondence sending a prime ideal of R contained in P to the induced prime
ideal in RP .
For convenience, we adopt the usual convention of writing X for Spec(R). For any f ∈ R
and P ∈ Xf , let ιf,P : Rf −→ RP be the natural inclusion map. Define a presheaf of rings
OX on X by setting
( )
Y
U has an open cover {X fα } with elements s α ∈ Rfα
OX (U) := (sP ) ∈ RP ,
such that sP = ιfα ,P (sα ) whenever P ∈ Xfα
P ∈U
for each open set U ⊂ X. The restriction map resU,V : OX (U) −→ OX (V ) is the map induced
by the projection map Y Y
RP −→ RP ,
P ∈U P ∈V
for each open subset V ⊂ U. The presheaf OX satisfies the following properties:
1. OX is a sheaf.
3. The stalk (OX )P is equal to RP for every P ∈ X. (In particular, X is a locally ringed space.)
Spec(R) is sometimes called an affine scheme because of the close relationship between
affine varieties in Ank and the Spec of their corresponding coordinate rings. In fact, the
correspondence between the two is an equivalences of category, although a complete state-
ment of this equivalence requires the notion of morphisms of schemes and will not be given
here. Nevertheless, we explain what we can of this correspondence below.
Let k be a field and write as usual Ank for the vector space k n . Recall that an affine variety
V in Ank is the set of common zeros of some prime ideal I ⊂ k[X1 , . . . , Xn ]. The coordinate
ring of V is defined to be the ring R := k[X1 , . . . , Xn ]/I, and there is an embedding i : V ,→
Spec(R) given by
862
The function i is not a homeomorphism, because it is not a bijection (its image is contained
inside the set of maximal ideals of R). However, the map i does define an order preserving
bijection between the open sets of V and the open sets of Spec(R) in the Zariski topology.
This isomorphism between these two lattices of open sets can be used to equate the sheaf
Spec(R) with the structure sheaf of the variety V , showing that the two objects are identical
in every respect except for the minor detail of Spec(R) having more points than V .
The additional points of Spec(R) are valuable in many situations and a systematic study of
them leads to the general notion of schemes. As just one example, the classical Bezout’s theo-
rem is only valid for algebraically closed fields, but admits a scheme–theoretic generalization
which holds over non–algebraically closed fields as well. We will not attempt to explain the
theory of schemes in detail, instead referring the interested reader to the references below.
REFERENCES
1. Robin Hartshorne, Algebraic Geometry, Springer–Verlag New York, Inc., 1977 (GTM 52).
2. David Mumford, The Red Book of Varieties and Schemes, Second Expanded Edition, Springer–
Verlag, 1999 (LNM 1358).
199.5 scheme
199.5.1 Definitions
An affine scheme is a locally ringed space (X, OX ) with the property that there exists a
ring R (commutative, with identity) whose prime spectrum Spec(R) is isomorphic to X as
a locally ringed space.
A scheme is a locally ringed space (X, OX ) which has an open cover {Uα }α∈I with the
property that each open set Uα , together with its restriction sheaf OX |Uα , is an affine scheme.
We define a morphism of schemes between two schemes (X, OX ) and (Y, OY ) to be a mor-
phism of locally ringed spaces f : (X, OX ) −→ (Y, OY ). A scheme over Y is defined to be a
scheme X together with a morphism of schemes X −→ Y .
Note: Some authors, notably Mumford and Grothendieck, require that a scheme be separated
as well (and use the term prescheme to describe a scheme that is not separated), but we will
not impose this requirement.
863
199.5.2 Examples
• Every affine scheme is clearly a scheme as well. In particular, Spec(R) is a scheme for
any commutative ring R.
d : X −→ X ×Spec Z X
into the fibre product X ×Spec Z X which is induced by the identity maps i : X −→ X in
each coordinate is a closed immersion.
Note the similarity to the definition of a Hausdorff topological space. In the situation of
topological spaces, a space X is Hausdorff if and only if the diagonal morphism X −→ X ×X
is a closed embedding of topological spaces. The definition of a separated scheme is very
similar, except that the topological product is replaced with the scheme fibre product.
The singular set of a variety X is the set of singular points. This is a proper subvariety. A
subvariety Y of X is contained in the singular set if and only if its local ring OY is regular.
864
Chapter 200
14A99 – Miscellaneous
On a scheme X, a Cartier divisor is a global section of the sheaf K∗ /O∗ , where K∗ is the
multiplicative sheaf of meromorphic functions, and O∗ the multiplicative sheaf of invertible
regular functions (the units of the structure sheaf).
Intuitively, the only information carried by Cartier divisor is where it vanishes, and the order
it does there. Thus, a Cartier divisor should give us a Weil divisor, and vice versa. On ”nice”
(for example, nonsingular over an algebraically closed field) schemes, it does.
In the projective plane, 4 points are said to be in general position iff no three of them are
on the same line. Dually 4 lines are in general position iff no three of them meet in the same
point. This definition naturally scales to more than four points/lines.
865
200.3 Serre’s twisting theorem
Let X be a scheme, and L an ample invertible sheaf on X. Then for any coherent sheaf F,
and sufficiently large n, H i (F ⊗ Ln ) = 0, that is, the higher sheaf cohomology of F ⊗ Ln is
trivial.
200.4 ample
Let R be a commutative ring. The height of a prime ideal p is the supremum of all integers
n such that there exists a chain p0 ⊂ · · · ⊂ pn = p of distinct prime ideals.
The Krull dimension of R is the supremum of the heights of all the prime ideals of R.
The set of invertible sheaves obviously form an abelian group under tensor multiplication,
called the Picard group of X.
866
200.7 locally free
A sheaf F on a ringed space X is called locally free if for each point x ∈ X, there is an open
neighborhood U of x such that F|U is free, or equivalently, Fx , the stalk of F at x is free as
a Ox -module. If Fx is of finite rank n, then F is said to be of rank n.
Theorem 18. Let X be a normal irreducible variety. The singular set S ⊂ X has codimension
2 or more.
A ssume not. We may assume X is affine, since codimension is local. Now let u be the ideal
of functions vanishing on S. This is an ideal of height 1, so the local ring of Y , OS = A(X)u,
where A(X) is the affine ring of X, is a 1-dimensional local ring, and integrally closed, since
X is normal. Any integrally closed 1-dimensional local domain is a DVR, and thus regular.
But S is the singular set, so its local ring is not regular, a contradiction.
Given a ringed space X, let Kx be the sheafification of the presheaf associating to each
open set U the fraction field of OX (U), where OX is the structure sheaf. This is called the
sheaf of meromorphic functions since on a complex algebraic variety, it is isomorphic to the
sheaf of functions which are meromorphic in the analytic sense.
An invertible sheaf L on a scheme X over a field k is called very ample if (1) at each point
x ∈ X, there is a global section s ∈ L(X) not vanishing at x, and (2) for each pair of points
x, y ∈ X, there is a global section s ∈ L(X) such that s vanishes at exactly one of x and y.
867
If k is algebraically closed, Riemann-Roch shows that on a curve X, any invertible sheaf of
degree greater than or equal to 2g is very ample.
868
Chapter 201
201.1 divisor
A divisor D on a projective
P nonsingular curve over an algebraically closed field is a formal
sum of points D = np p where only finitely many of the np ∈ Z are nonzero.
P
The degree of a divisor D is deg(D) = np .
869
Chapter 202
A variety is said to be of general type if its Kodaira dimension equals its dimension.
870
Chapter 203
If F is a sheaf of abelian groups (or anything else), so is f∗ F, so likewise we get direct image
functors f∗ : Ab(X) → Ab(Y ), where Ab(X) is the category of sheaves of abelian groups
on X.
871
Chapter 204
204.1 site
872
Chapter 205
Serre duality is a theorem which can be thought of as a massive generalization of Poincare duality
to an algebraic context.
The most general version of Serre duality states that on certain schemes X of dimension n, in-
cluding all projective varieties over any algebraically closed field k, there is a natural isomorphism
Exti (F, ω) ∼
= H n−i(X, F)∗
, F is any coherent sheaf on X and ω is a fixed sheaf, called the dualizing sheaf.
Exti (F, ω) ∼
= Exti (OX , F ∗ ⊗ ω) ∼
= H i(X, F ∗ ⊗ ω),
so that we obtain the somewhat more familiar looking fact that there is a perfect pairing
H i (X, F ∗ ⊗ ω) × H n−i (X, F) → k.
873
205.2 sheaf cohomology
Let X be a topological space, and assume that the category of sheaves of abelian groups on
X has enough injectives. Then we define the sheaf cohomology H i(X, F) of a sheaf F to be
the right derived functors of the global section functor F → Γ(X, F).
Usually we are interested in the case where X is a scheme, and F is a coherent sheaf. In this
case, it does not matter if we take the derived functors in the category of sheaves of abelian
groups or coherent sheaves.
Sheaf cohomology can be explicitly calculated using Čech cohomology. Choose an open cover
{Ui } of X. We define Y
C i (F) = F(Uj0···ji )
T T
where the product is over i + 1 element subsets of {1, . . . , n} and Uj0 ···ji = Uj0 · · · Uji . If
s ∈ F(Uj0···ji ) is thought of as an element of C i (F), then the differential
j`+1 −1
!
Y Y
`
∂(s) = (−1) s|Uj0 ···j` kj`+1 ···ji
` k=j` +1
makes C ∗ (F) into a chain complex. The cohomology of this complex is denoted Ȟ i (X, F)
and called the Čech cohomology of F with respect to the cover {Ui }. There is a natural map
H i (X, F) → Ȟ i (X, F) which is an isomorphism for sufficiently fine covers.
874
Chapter 206
Let V be an algebraic variety defined over a field K. By V (K) we denote the set of points
on V defined over K. Let K̄ be an algebraic closure of K. For a valuation ν of K, we write
Kν for the completion of K at ν. In this case, we can also consider V defined over Kν and
talk about V (Kν ).
Definition 13.
3. If V is locally soluble for all ν then we say that V satisfies the Hasse condition, or
we say that V /K is everywhere locally soluble.
The Hasse Principle is the idea (or desire) that an everywhere locally soluble variety V
must have a rational point, i.e. a point defined over K. Unfortunately this is not true, there
are examples of varieties that satisfy the Hasse condition but have no rational points.
Example: A quadric (of any dimension) satisfies the Hasse condition. This was proved by
Minkowski for quadrics over Q and by Hasse for quadrics over a number field.
REFERENCES
1. Swinnerton-Dyer, Diophantine Equations: Progress and Problems, online notes.
875
Chapter 207
14H37 – Automorphisms
Let K be a field of characteristic p > 0 and let q = pr . Let C be a curve defined over K
contained in PN , the projective space of dimension N. Define the homogeneous ideal of C
to be (the ideal generated by):
φ : C → C (q)
In order to check that the Frobenius morphism is well defined we need to prove that
This is equivalent to proving that for any g ∈ I(C (q) ) we have g(φ(P )) = 0. Without loss of
generality we can assume that g is a generator of I(C (q) ), i.e. g is of the form g = f (q) for
some f ∈ I(C). Then:
876
as desired.
E = E (q)
Hence the Frobenius morphism is an endomorphism (or isogeny) of the elliptic curve.
REFERENCES
1. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
877
Chapter 208
Fermat’s spiral (or parabolic spiral) is an archimedean spiral with the equation :
r 2 = a2 θ.
This curve was discovered by Fermat in 1636.
878
208.3 folium of Descartes
x3 + y 3 = 3axy,
d : y + x + a = 0,
208.4 spiral
Let c(s) be a curve, and let τ (s) and κ(s) be the torsion and the curvature of c(s) . Then a
τ (s)
spiral is a curve with the rapport κ(s) constant, for all s.
879
Chapter 209
Let g : I → R3 be a parameterized space curve, assumed to be regular and free of points of inflection.
Physically, we conceive of g(t) as a particle moving through space. Let T(t), N(t), B(t) de-
note the corresponding moving trihedron. The speed of this particle is given by
s(t) = kg0(t)k.
In order for a moving particle to escape the osculating plane, it is necessary for the particle
to “roll” along the axis of its tangent vector, thereby lifting the normal acceleration vector
out of the osculating plane. The “rate of roll”, that is to say the rate at which the osculating
plane rotates about the tangent vector, is given by B(t) · N0 (t); it is a number that depends
on the speed of the particle. The rate of roll relative to the particle’s speed is the quantity
called the torsion of the curve, a quantity that is invariant with respect to reparameterization.
The torsion τ (t) is, therefore, a measure of an intrinsic property of the oriented space curve,
another real number that can be covariantly assigned to the point g(t).
880
Chapter 210
Let E be an elliptic curve over Q, and let L(E, s) be the L-series attached to E.
2. Let R = rank(E(Q)). Then the residue of L(E, s) at s = 1, i.e. lims→1 (s−1)−R L(E, s)
has a concrete expression involving the following invariants of E: the real period, the
Shafarevich-Tate group, the elliptic regulator and the Neron model of E.
J. Tate said about this conjecture: “This remarkable conjecture relates the behavior
of a function L at a point where it is not at present known to be defined to the
order of a group (Sha) which is not known to be finite!”
Conjecture 2. The root number of E, denoted by w, indicates the parity of the rank of the
elliptic curve, this is, w = 1 if and only if the rank is even.
There has been a great amount of research towards the B-SD conjecture. For example, there
are some particular cases which are already known:
881
REFERENCES
1. Claymath Institute, Description, online.
2. J. Coates, A. Wiles, On the Conjecture of Birch and Swinnerton-Dyer, Inv. Math. 39, 223-251
(1977).
3. James Milne, Elliptic Curves, online course notes.
4. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
5. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
√
| ap |6 2 p
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
Np = {(x, y) ∈ Fp 2 : y 2 + a1 xy + a3 y − x3 − a2 x2 − a4 x − a6 ⇔ 0 mod p}
882
Also, let ap = p + 1 − Np . We define the local part at p of the L-series to be:
1 − ap T + pT 2 , if E has good reduction at p,
1 − T , if E has split multiplicative reduction at p,
Lp (T ) =
1 + T , if E has non-split multiplicative reduction at p,
1, if E has additive reduction at p.
Note: The product converges and gives an analytic function for all Re(s) > 3/2. This follows
√
from the fact that | ap |6 2 p. However, far more is true:
Theorem 21 (Taylor, Wiles). The L-series L(E, s) has an analytic continuation to the
entire complex plane, and it satisfies the following functional equation. Define
The number w above is usually called the root number of E, and it has an important
conjectural meaning (see Birch and Swinnerton-Dyer conjecture).
This result was known for elliptic curves having complex multiplication (Deuring, Weil) until
the general result was finally proven.
REFERENCES
1. James Milne, Elliptic Curves, online course notes.
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton
University Press, Princeton, New Jersey, 1971.
883
210.4 Mazur’s theorem on torsion of elliptic curves
Theorem 22 (Mazur). Let E/Q be an elliptic curve. Then the torsion subgroup Etorsion (Q)
is exactly one of the following groups:
Z/NZ 1 6 N 6 10 or N = 12
Z/2Z ⊕ Z/2NZ 1 6 N 6 4
Note: see Nagell-Lutz theorem for an efficient algorithm to compute the torsion subgroup of
an elliptic curve defined over Q.
REFERENCES
1. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
2. Barry Mazur, Modular curves and the Eisenstein ideal, IHES Publ. Math. 47 (1977), 33-186.
3. Barry Mazur, Rational isogenies of prime degree, Invent. Math. 44 (1978), 129-162.
A Mordell curve is an elliptic curve E/K, for some field K, which admits a model by a
Weierstrass equation of the form:
y 2 = x3 + k, k∈K
Examples:
4. Let E3 /Q : y 2 = x3 + 496837487681 then this is a Mordell curve with E3 (Q) ' Z8 . See
generators here.
884
5. Here you can find a list of the minimal-known positive and negative k for Mordell
curves of given rank, and the Mordell curves with maximum rank known (see B-SD
conjecture).
The following theorem, proved independently by E. Lutz and T. Nagell, gives a very efficient
method to compute the torsion subgroup of an elliptic curve defined over Q.
x(P ), y(P ) ∈ Z
3. If P is of order 2 then
REFERENCES
1. E. Lutz, Sur l’equation y 2 = x3 − Ax − B dans les corps p-adic, J. Reine Angew. Math. 177
(1937), 431-466.
2. T. Nagell, Solution de quelque problemes dans la theorie arithmetique des cubiques planes du
premier genre, Wid. Akad. Skrifter Oslo I, 1935, Nr. 1.
3. James Milne, Elliptic Curves, online course notes.
4. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
885
210.7 Selmer group
Given an elliptic curve E we can define two very interesting and important groups, the
Selmer group and the Tate-Shafarevich group, which together provide a measure of the
failure of the Hasse principle for elliptic curves, by measuring whether the curve is everywhere
locally soluble. Here we present the construction of these groups.
Let E, E 0 be elliptic curves defined over Q and let Q̄ be an algebraic closure of Q. Let
φ : E → E 0 be an non-constant isogeny (for example, we can let E = E 0 and think of φ as
being the “multiplication by n” map, [n] : E → E). The following standard result asserts
that φ is surjective over Q̄:
Theorem 24. Let C1 , C2 be curves defined over an algebraically closed field K and let
ψ : C1 → C2
Since φ : E(Q̄) → E 0 (Q̄) is non-constant, it must be surjective and we obtain the following
exact sequence:
where E(Q̄)[φ] = Ker φ. Let G = Gal(Q̄/Q), the absolute Galois group of Q, and consider
the ith -cohomology group H i (G, E(Q̄)) (we abbreviate by H i(G, E)). Using equation (1) we
obtain the following long exact sequence (see proposition 1 in group cohomology):
Note that
G
H 0 (G, E(Q̄)[φ]) = (E(Q̄)[φ]) = E(Q)[φ]
and similarly
H 0 (G, E) = E(Q), H 0 (G, E 0 ) = E 0 (Q)
We could repeat the same procedure but this time for E, E 0 defined over Qp ,for some
prime number p, and obtain a similar exact sequence but with coefficients in Qp which
886
relates to the original in the following commutative diagram (here Gp = Gal(Q̄p /Qp )):
The goal here is to find a finite group containing E 0 (Q)/φ(E(Q)). Unfortunately H 1 (G, E(Q̄)[φ])
is not necessarily finite. With this purpose in mind, we define the φ-Selmer group:
!
Y
S φ (E/Q) = Ker H 1 (G, E(Q̄)[φ]) → H 1 (Gp , E)
p
Equivalently, the φ-Selmer group is the set of elements γ of H 1 (G, E(Q̄)[φ]) whose image
γp in H 1 (Gp , E(Q̄p)[φ]) comes from some element in E(Qp ).
Finally, by imitation of the definition of the Selmer group, we define the Tate-Shafarevich
group: !
Y
T S(E/Q) = Ker H 1 (G, E) → H 1 (Gp , E)
p
The Tate-Shafarevich group is precisely the group that measures the Hasse principle in the
elliptic curve E. It is unknown if this group is finite.
REFERENCES
1. J.P. Serre, Galois Cohomology, Springer-Verlag, New York.
2. James Milne, Elliptic Curves, online course notes.
3. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
4. R. Hartshorne, Algebraic Geometry, Springer-Verlag, 1977.
Let E be a cubic curve over a field K with Weierstrass equation f (x, y) = 0, where:
f (x, y) = y 2 + a1 xy + a3 y − x3 − a2 x2 − a4 x − a6
887
which has a singular point P = (x0 , y0). This is equivalent to:
∂f /∂x(P ) = ∂f /∂y(P ) = 0
Definition 16. The singular point P is a node if α 6= β. In this case there are two different
tangent lines to E at P , namely:
y − y0 = α(x − x0 ), y − y0 = β(x − x0 )
Note: See the entry for elliptic curve for examples of cusps and nodes.
There is a very simple criterion to know whether a cubic curve in Weierstrass form is singular
and to differentiate nodes from cusps:
Proposition 10. Let E/K be given by a Weierstrass equation, and let ∆ be the discriminant
and c4 as in the definition of ∆. Then:
Let E/Q be an elliptic curve (we could work over any number field K, but we choose Q for
simplicity in the exposition). Assume that E has a Weierstrass equation:
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
888
Definition 17.
Examples:
2. However E1 has bad reduction at p = 5, and the reduction is additive (since modulo 5
we can write the equation as [(y − 0) − 0(x − 0)]2 − x3 and the slope is 0).
REFERENCES
1. James Milne, Elliptic Curves, online course notes.
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton
University Press, Princeton, New Jersey, 1971.
889
210.9 conductor of an elliptic curve
Let E be an elliptic curve over Q. For each prime p ∈ Z define the quantity fp as follows:
0, if E has good reduction at p,
1, if E has multiplicative reduction at p,
fp =
2, if E has additive reduction at p, and p 6= 2, 3,
2 + δp , if E has additive reduction at p = 2 or 3.
where δp depends on wild ramification in the action of the inertia group at p of Gal(Q̄/Q)
on the Tate module Tp (E).
Definition 18. The conductor NE/Q of E/Q is defined to be:
Y
NE/Q = pf p
p
where the product is over all primes and the exponent fp is defined as above.
Example:
REFERENCES
1. James Milne, Elliptic Curves, online course notes.
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
210.10.1 Basics
An elliptic curve over a field K is a projective nonsingular algebraic curve E over K of genus
1 together with a point O of E defined over K. The word ”genus” is taken here in the
890
Figure 210.1: Graph of y 2 = x(x − 1)(x + 1)
algebraic geometry sense, and has no relation with the topological notion of genus (defined
as 1 − χ/2, where χ is the Euler characteristic) except when the field of definition K is the
complex numbers C.
Using the Riemann-Roch theorem, one can show that every elliptic curve E is the zero set
of a Weierstrass equation of the form
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 ,
for some ai ∈ K, where the polynomial on the right hand side has no double roots. When K
has characteristic other than 2 or 3, one can further simpify this Weierstrass equation into
the form
E : y 2 = x3 − 27c4 x − 54c6.
The extremely strange numbering of the coefficients is an artifact of the process by which the
above equations are derived. Also, note that these equation are for affine curves; to translate
them to projective curves, one has to homogenize the equations (replace x with X/Z, and y
with Y /Z).
210.10.2 Examples
We present here some pictures of elliptic curves over the field R of real numbers. These
pictures are in some sense not representative of most of the elliptic curves that people work
with, since many of the interesting cases tend to be of elliptic curves over algebraically closed
fields. However, curves over the complex numbers (or, even worse, over algebraically closed
fields in characteristic p) are very difficult to graph in three dimensions, let alone two.
Finally, Figures 210.3 and 210.4 are examples of algebraic curves that are not elliptic curves.
Both of these curves have singularities at the origin.
891
Figure 210.4: Graph of y 2 = x3 . Has a cusp at the origin.
The points on an elliptic curve have a natural group structure, which makes the elliptic curve
into an abelian variety. There are many equivalent ways to define this group structure; two
of the most common are:
• Every Weyl divisor on E is linearly equivalent to a unique divisor of the form [P ] − [O]
for some P ∈ E, where O ∈ E is the base point. The divisor class group of E then
yields a group structure on the points of E, by way of this correspondence.
• Let O ∈ E denote the base point. Then one can show that every line joining two points
on E intersects a unique third point of E (after properly accounting for tangent lines
as a multiple intersection). For any two points P, Q ∈ E, define their sum as:
1. Form the line between P and Q; let R be the third point on E that intersects this
line;
2. Form the line between O and R; define P + Q to be the third point on E that
intersects this line.
This addition operation yields a group operation on the points of E having the base
point O for identity.
Over the complex numbers, the general correspondence between algebraic and analytic theory
specializes in the elliptic curves case to yield some very useful insights into the structure of
elliptic curves over C. The starting point for this investigation is the Weierstrass p–function,
which we define here.
Definition 10. A lattice in C is a subgroup L of the additive group C which is generated
by two elements ω1 , ω2 ∈ C that are linearly independent over R.
Definition 11. For any lattice L in C, the Weierstrass pL –function of L is the function
pL : C −→ C given by
1 X 1 1
pL (z) := 2 + − .
z (z − ω)2 ω 2
ω∈L\{0}
When the lattice L is clear from context, it is customary to suppress it from the notation
and simply write p for the Weierstrass p–function.
892
• p(z) is a meromorphic function with double poles at points in L.
The last property above implies that, for any z ∈ C/L, the point (p(z), p0 (z)) lies on the
elliptic curve E : y 2 = 4x3 − g2 x − g3 . Let φ : C/L −→ E be the map given by
(
(p(z), p0 (z)) z ∈
/L
φ(z) :=
∞ z∈L
(where ∞ denotes the point at infinity on E). Then φ is actually a bijection (!), and
moreover the map φ : C/L −→ E is an isomorphism of Riemann surfaces as well as a group
isomorphism (with the addition operation on C/L inherited from C, and the elliptic curve
group operation on E).
We can go even further: it turns out that every elliptic curve E over C can be obtained in
this way from some lattice L. More precisely, the following is true:
Theorem 16. 1. For every elliptic curve E : y 2 = 4x3 − bx − c over C, there is a unique
lattice L ⊂ C whose constants g2 and g3 satisfy b = g2 and c = g3 .
2. Two elliptic curves E and E 0 over C are isomorphic if and only if their corresponding
lattices L and L0 satisfy the equation L0 = αL for some scalar α ∈ C.
REFERENCES
1. Dale Husemoller, Elliptic Curves. Springer–Verlag, New York, 1997.
2. James Milne, Elliptic Curves, online course notes. http://www.jmilne.org/math/CourseNotes/math679.html
3. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer–Verlag, New York, 1986.
893
210.11 height function
1. For all Q ∈ A there exists a constant C1 , depending on A and Q, such that for all
P ∈ A:
h(P + Q) 6 2h(P ) + C1
2. There exists an integer m > 2 and a constant C2 , depending on A, such that for all
P ∈ A:
h(mP ) > m2 h(P ) − C2
{P ∈ A : h(P ) 6 C3 }
Examples:
1. For t = p/q ∈ Q, a fraction in lower terms, define H(t) = max{| p |, | q |}. Even
though this is not a height function as defined above, this is the prototype of what a
height function should look like.
2. Let E be an elliptic curve over Q. The function on E(Q), the points in E with
coordinates in Q, hx : E(Q) → R :
log H(x(P )), if P 6= 0
hx (P ) =
0, if P = 0
is a height function (H is defined as above). Notice that this depends on the chosen
Weierstrass model of the curve.
3. The canonical height of E/Q (due to Neron and Tate) is defined by:
Finally we mention the fundamental theorem of “descent”, which highlights the importance
of the height functions:
894
REFERENCES
1. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
210.12 j-invariant
2. The j-invariant of E is
c34
j=
∆
3. The invariant differential is
dx dy
ω= = 2
2y + a1 x + a3 3x + 2a2 x + a4 − a1 y
Example:
895
210.13 rank of an elliptic curve
Let K be a number field and let E be an elliptic curve over K. By E(K) we denote the set
of points in E with coordinates in K.
T he proof of this theorem is fairly involved. The main two ingredients are the so called
“weak Mordell-Weil theorem” (see below), the concept of height function for abelian groups
and the “descent” theorem.
See [2], Chapter VIII, page 189.
The Mordell-Weil theorem implies that for any elliptic curve E/K the group of points has
the following structure: M
E(K) ' Etorsion (K) ZR
where Etorsion (K) denotes the set of points of finite order (or torsion group), and R is a
non-negative integer which is called the rank of the elliptic curve. It is not known how big
this number R can get for elliptic curves over Q. The largest rank known for an elliptic curve
over Q is 24 Martin-McMillen (2000).
Note: see Mazur’s theorem for an account of the possible torsion subgroups over Q.
Examples:
2. Let E2 /Q : y 2 = x3 + 1, then E2 (Q) ' Z/6Z. The torsion group is generated by the
point (2, 3).
L
3. Let E3 /Q : y 2 = x3 + 109858299531561, then E3 (Q) ' Z/3Z Z5 . See generators
here.
REFERENCES
1. James Milne, Elliptic Curves, online course notes. http://www.jmilne.org/math/CourseNotes/math679.html
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
896
4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton
University Press, Princeton, New Jersey, 1971.
210.14 supersingular
An elliptic curve E over a field of characteristic p defined by the cubic equation f (w, x, y) = 0
is called supersingular if the coefficient of (wxy)p−1 in f (w, x, y)p−1 is zero.
A supersingular elliptic curve is said to have Hasse invariant 0; an ordinary (i.e. non-
supersingular) elliptic curve is said to have Hasse invariant 1.
This is equivalent to many other conditions. E is supersingular iff the invariant differential
is exact. Also, E is supersingular iff F ∗ : H 1 (E, OE ) → H 1 (E, OE ) is nonzero where F ∗ is
induced from the Frobenius morphism F : E → E.
Let E be an elliptic curve defined over Q and let p ∈ Z be a prime. Assume E has a
Weierstrass equation of the form:
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
e be reduction of E modulo p (see bad reduction) which is a
with coefficients ai ∈ Z. Let E
curve defined over Fp = Z/pZ. We have a map (the reduction map)
e p)
πp : E(Q) → E(F
E0 (Q) = {P ∈ E(Q) | πp (P ) = Pe ∈ E
ens (Fp )}
E1 (Q) = {P ∈ E(Q) | πp (P ) = Pe = O}
e = Ker(πp )
897
Proposition 11. There is an exact sequence of abelian groups
ens (Fp ) −→ 0
0 −→ E1 (Q) −→ E0 (Q) −→ E
Notation: Given a group G, we denote by G[m] the m-torsion of G, i.e. the points of order
m.
Proposition 12. Let E/Q be an elliptic curve (as above) and let m be a positive integer
such that gcd(p, m) = 1. Then:
1.
E1 (Q)[m] = {O}
is injective.
Remark: part 2 of the proposition is quite useful when trying to compute the torsion
subgroup of E/Q. Note that this can be reinterpreted as follows: for all primes p which do
not divide m, E(Q)[m] −→ E(Fe p ) must be injective and therefore the number of m-torsion
points divides the number of points defined over Fp .
Example:
Let E/Q be given by
y 2 = x3 + 3
The discriminant of this curve is ∆ = −3888 = −24 35 . Recall that if p is a prime of bad
e is non-singular
reduction, then p | ∆. Thus the only primes of bad reduction are 2, 3, so E
for all p > 5.
e Then we have
Let p = 5 and consider the reduction of E modulo 5, E.
e
E(Z/5Z) e (1, 2), (1, 3), (2, 1), (2, 4), (3, 0)}
= {O,
where all the coordinates are to be considered modulo 5 (remember the point at infinity!).
e
Hence N5 =| E(Z/5Z) |= 6. Similarly, we can prove that N7 = 13.
Now let q 6= 5, 7 be a prime number. Then we claim that E(Q)[q] is trivial. Indeed, by the
remark above we have
| E(Q)[q] | divides N5 = 6, N7 = 13
so | E(Q)[q] | must be 1.
898
For the case q = 5 be know that | E(Q)[5] | divides N7 = 13. But it is easy to see that
if E(Q)[p] is non-trivial, then p divides its order. Since 5 does not divide 13, we conclude
that E(Q)[5] must be trivial. Similarly E(Q)[7] is trivial as well. Therefore E(Q) has trivial
torsion subgroup.
Notice that (1, 2) ∈ E(Q) is an obvious point in the curve. Since we have proved that there
is no non-trivial torsion, this point must be of infinite order! In fact
E(Q) ∼
=Z
899
Chapter 211
14H99 – Miscellaneous
where g is the genus of the curve, and K is the canonical divisor (`(K) = g).
211.2 genus
In topology, if S is an orientable surface, its genus g(S) is the number of “handles” it has.
More precisely, from the classification of surfaces, we know that any orientable surface is a
sphere, or the connected sum of n tori. We say the sphere has genus 0, and that the connected
sum of n tori has genus n (alternatively, genus is additive with respect to connected sum,
and the genus of a torus is 1). Also, g(S) = 1−χ(S)/2 where χ(S) is the Euler characteristic
of S.
In algebraic geometry, the genus of a smooth projective curve X over a field k is the
dimension over k of the vector space Ω1 (X) of global regular differentials on X. Recall
that a smooth complex curve is also a Riemann surface, and hence topologically a surface.
In this case, the two definitions of genus coincide.
900
Version: 6 Owner: bwebste Author(s): bwebste, muqabala, nerdy2
For a divisor D, let LD be the associated line bundle. By Serre duality, H 0 (LK − D) ∼ =
1
H (LD), so `(D)−`(K −D) = χ(D), the Euler characteristic of LD. Now, let p be a point of
C, and consider the divisors D and D + p. There is a natural injection LD → LD + p. This
is an isomorphism anywhere away from p, so the quotient E is a skyscraper sheaf supported
at p. Since skyscraper sheaves are flasque, they have trivial higher cohomology, and so
χ(E) = 1. Since Euler characteristics add along exact sequences (because of the long exact
sequence in cohomology) χ(D + p) = χ(D) + 1. Since deg(D + p) = deg(D) + 1, we see that
if Riemann-Rock holds for D, it holds for D+p, and vice-versa. Now, we need only confirm
that the theorem holds for a single line bundle. OX is a line bundle of degree 0. `(0) = 1
and `(K) = g. Thus, Riemann-Roch holds here, and thus for all line bundles.
901
Chapter 212
An affine algebraic group over a field k is quasi-affine variety G (a locally closed subset of
affine space) over k, which is a equipped with a group structure such that the multiplication
map m : G × G → G and inverse map i : G → G are algebraic.
For example, k is an affine algebraic group over itself with the group law being addition,
and as is k ∗ = k − {0} with the group law multiplication. Other common examples of affine
algebraic groups are GLn k, the general linear group over k (identifying matrices with affine
space) and any algebraic torus over k.
Let k be a field. Then k ∗ , the multiplicative group of k is an affine algebraic group over k.
An affine algebraic group of the form (k ∗ )n is called an algebraic torus over k.
The name is connected to the fact that if k = C, then an algebraic torus is the complexification
of the standard torus (S 1 )n .
902
Chapter 213
213.1 normal
Let X be a variety or algebraic set. X is said to be normal at a point p ∈ X if the local ring
Op is integrally closed. X is said to be normal if it is normal at every point. If X is
non-singular at p, it is normal at p, since regular local rings are integrally closed.
903
Chapter 214
Let G be a semisimple Lie group, and λ be an integral weight for that group. λ naturally
defines a one-dimensional representation Cλ of the Borel subgroup B of G, by simply pulling
back the representation on the maximal torus T = B/U where U is the unipotent radical
of G. Since we can think of the projection map π : G → G/B as a principle B-bundle, to
each Cλ , we get an associated fiber bundle Lλ on G/B, which is obviously a line bundle.
Identifying Lλ with its sheaf of holomorphic sections, we consider the sheaf cohomology
groups H i (Lλ ). Realizing g, the Lie algebra of G, as vector fields on G/B, we see that g acts
on the sections of Lλ over any open set, and so we get an action on cohomology groups. This
integrates to an action of G, which on H 0 (Lλ ) is simply the obvious action of the group.
The Borel-Bott-Weil theorem states the following: if (λ + ρ, α) = 0 for any simple root α of
g, then
H i (Lλ ) = 0
for all i, where ρ is half the sum of all the positive roots. Otherwise, let w ∈ W , the
Weyl group of G, be the unique element such that w(λ+ρ) is dominant (i.e. (w(λ+ρ), α) > 0
for all simple roots α). Then
H `(w) (Lλ ) ∼
= Vλ
where Vλ is the unique irreducible representation of highest weight λ, and H i (Lλ ) = 0 for all
other i. In particular, if λ is already dominant, then Γ(Lλ ) ∼
= Vλ , and the higher cohomology
of Lλ vanishes.
mλ : G/B → P (Γ(Lλ )) .
904
This map is an obvious one, which takes the coset of B to the highest weight vector v0 of Vλ .
This can be extended by equivariance since B fixes v0 . This provides an alternate description
of Lλ .
For example, consider G = SL2 C. G/B is CP 1, the Riemann sphere, and an integral weight
is specified simply by an integer n, and ρ = 1. The line bundle Ln is simply O(n), whose
sections are the homogeneous polynomials of degree n. This gives us in one stroke the
representation theory of SL2 C: Γ(O(1)) is the standard representation, and Γ(O(n)) is its
nth symmetric power. We even have a unified decription of the action of the Lie algebra,
derived from its realization as vector fields on CP 1 : if H, X, Y are the standard generators
of sl2 C, then
d d
H=x −y
dx dy
d
X=x
dy
d
Y =y
dx
Let k be a field, and let V be a vector space over k of dimension n and choose an increasing
sequence i = (i1 , . . . , in ), with 1 6 i1 < · · · < in 6 n. Then the (partial) flag variety F`(V, i)
associated to this data is the set of all flags {0} 6 V1 6 · · · 6 Vn with dim Vj = ij . This
has a natural embedding into the product of Grassmannians G(V, i1 ) × · · · G(V, in ), and its
image here is closed, making F`(V, i) into a projective variety over k. If k = C these are
often called flag manifolds.
The group Sl(V ) acts transtively on F`(V, i), and the stabilizer of a point is a parabolic subgroup.
Thus, as a homogeneous space, F`(V, i) ∼ = Sl(V )/P where P is a parabolic subgroup of
Sl(V ). In particular, the complete flag variety is isomorphic to Sl(V )/B, where B is the
Borel subgroup.
905
Chapter 215
If F is invertible, then its Jacobi determinant det(∂fi /∂xj ), which is a polynomial over C,
vanishes nowhere and hence must be a non-zero constant.
The Jacobian conjecture asserts the converse: every polynomial map Cn → Cn whose Jacobi
determinant is a non-zero constant is invertible.
906
Chapter 216
A symmetric and positive definite matrix can be efficiently decomposed into a lower and
upper triangular matrix. For a matrix of any type, this is achieved by the LU decomposition
which factorizes A = LU. If A satisfies the above criteria, one can decompose more efficiently
into A = LLT , where L (which can be seen as the “matrix square root” of A) is a lower
triangular matrix with positive diagonal elements. L is called the Cholesky triangle.
Cholesky decomposition is often used to solve the normal equations in linear least squares
problems; they give AT Ax = AT b , in which AT A is symmetric and positive definite.
a11 a12 · · · a1n
a21 l 0 · · · 0 l l · · · l
a22 · · · a2n 11 11 21
l21 l22 · · · 0 0 l22 · · · ln2
n1
Solving for the unknowns (the nonzero lij s), for i = i, · · · , n and j = i + 1, . . . , n, we get:
907
v !
u i−1
u X
lii = t aii − 2
lik
k=1
i−1
!
X
lji = aji − ljk lik /lii
k=1
Because A is symmetric and positive definite, the expression under the square root is always
positive, and all lij are real.
References
In other words, an n × n matrix with only +1 and −1 as its elements is Hadamard if the
inner product of two distinct rows is 0 and the inner product of a row with itself is n.
These matrices were first considered as Hadamard determinants, because the determinant
of an Hadamard matrix satisfies equality in Hadamard’s determinant theorem, which states
that if X = xij is a matrix of order n where |xij | 6 1 for all i and j, then
det(X) 6 nn/2
property 1:
908
property 2:
If the rows and columns of an Hadamard matrix are permuted, the matrix remains Hadamard.
property 3:
Hence it is always possible to arrange to have the first row and first column of an Hadamard
matrix contain only +1 entries. An Hadamard matrix in this form is said to be normalized.
a11 a12 a13 ··· a1,n−1 a1n
a21 a22 a23 ··· a2,n−1 a2n
0 a32 a33 ··· a3,n−1 a3n
0 0 a43 ··· a4,n−1 a4n
.. .. .. .. .. ..
. . . . . .
0 0 0 0 an,n−1 ann
a11 a12 0 ··· 0 0
a21 a22 a23 ··· 0 0
.. .. .. .. .. ..
. . . . . .
an−2,1 an−2,2 an−2,3 · · · an−2,n−1 0
an−1,1 an−1,2 an−1,3 · · · an−1,n−1 an−1,n
an,1 an,2 an,3 · · · an,n−1 an,n
909
216.4 If A ∈ Mn(k) and A is supertriangular then An = 0
Let
f = f (x) = f (x1 , . . . , xn )
Ω is an integration region, and one integrates over all x ∈ Ω, or equivalently, all u ∈ u(Ω).
dx/du = (du/dx)−1 is the Jacobi matrix and
| det(dx/du)| = | det(du/dx)|−1
910
As an example, take n = 2 and
Define
p
ρ = −2 log(x1 ) ϕ = 2πx
u1 = ρ cos ϕ u2 = ρ sin ϕ
and
d2 x = | det(dx/du)|d2u = | det(du/dx)|−1d2 u
= (x1 /2π) = (1/2π) exp(−(u21 + u22/2))d2 u
This shows that if x1 and x2 are independent random variables with uniform distributions
between 0 and 1, then u1 and u2 as defined above are independent random variables with
standard normal distributions.
References
911
216.6 Jacobi’s Theorem
Proof. Suppose A is an n × n square matrix. For the determinant, we then have det A =
det AT , and det(−A) = (−1)n det A. Thus, since n is odd, and AT = −A, we have det A =
− det A, and the theorem follows. 2
Remarks
1. According to [1], this theorem was given by Carl Gustav Jacob Jacobi (1804-1851) [2]
in 1827.
0 1
2. The 2 × 2 matrix shows that Jacobi’s theorem does not hold for 2 × 2
−1 0
matrices. The determinant of the 2n × 2n block matrix with these 2 × 2 matrices on
the diagonal equals (−1)n . Thus Jacobi’s theorem does not hold for matrices of even
dimension.
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. The MacTutor History of Mathematics archive, Carl Gustav Jacob Jacobi
Definition. Let A be a n × n matrix (with entries aij ) and let B be a m × m matrix. Then
the Kronecker product of A and B is the mn × mn block matrix
a11 B · · · a1n B
.. .
A ⊗ B = ... ..
. .
an1 B · · · ann B
The Kronecker product is also known as the direct product or the tensor product [1].
912
1. The product is bilinear. If k is a scalar, and A, B and C are square matrices, such that
B and C are of the same dimension, then
A ⊗ (B + C) = A ⊗ B + A ⊗ C,
(B + C) ⊗ A = B ⊗ A + C ⊗ A,
k(A ⊗ B) = (kA) ⊗ B = A ⊗ (kB).
2. If A, B, C, D are square matrices such that the products AC and BD exist, then
(A ⊗ B)(C ⊗ D) exists and
(A ⊗ B)(C ⊗ D) = AC ⊗ BD.
(A ⊗ B)−1 = A−1 ⊗ B −1 .
3. If A and B are square matrices, then for the transpose (AT ) we have
(A ⊗ B)T = AT ⊗ B T .
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation, Prentice Hall, 2000
216.8 LU decomposition
Any non-singular matrix A can be expressed as a product A = LU; there exists exactly one
lower triangular matrix L and exactly one upper triangular matrix U of the form:
913
a11 a12 · · · a1n 1 0 ··· 0 u11 u12 · · · u1n
a21 a22 · · · a2n l21 1 · · · a2n 0 a22 · · · u2n
A = .. .. .. = .. .. . . .. .. .. . . ..
. . . . . . . . . . .
an1 an2 · · · ann ln1 ln2 · · · 1 0 0 · · · unn
if row exchanges (partial pivoting) are not necessary. With pivoting, we have to introduce a
permutation matrix P . Instead of A one then decomposes P A:
P A = LU
LU decomposition is useful, e.g. for the solution of the exactly determined system of linear equations
Ax = b, when there is more than one right-hand side b. With A = LU the system becomes
LUx = b
or
Lc = b and Ux = c
Theorem [Peetre’s inequality] [1, 2] If t is a real number and x, y are vectors in Rn , then
1 + |x|2 t
≤ 2|t| (1 + |x − y|2)|t| .
1 + |y|2
Proof. (Following [1].) Suppose b and c are vectors in Rn . Then, from (|b| − |c|)2 ≥ 0, we
obtain
2|b| · |c| ≤ |b|2 + |c|2 .
914
Using this inequality and the Cauchy-Schwarz inequality, we obtain
1 + |a|2
≤ 2(1 + |a − b|2 ). (216.9.1)
1 + |b|2
Let us now return to the given inequality. If t = 0, the claim is trivially true for all x, y in
Rn . If t > 0, then raising both sides in inequality 216.9.1 to the power of t, using t = |t|, and
setting a = x, b = y yields the result. On the other hand, if t < 0, then raising both sides
in inequality 216.9.1 to the power to −t, using −t = |t|, and setting a = y, b = x yields the
result. 2
REFERENCES
1. J. Barros-Neta, An introduction to the theory of distributions, Marcel Dekker, Inc.,
1973.
2. F. Treves, Introduction To Pseudodifferential and Fourier Integral Operators, Vol. I,
Plenum Press, 1980.
If A is a complex square matrix of dimention n (i.e. A ∈ Matn (C)), then there exists a
unitary matrix Q ∈ Matn (C) such that
QH AQ = T = D + N
where H is the conjugate transpose, D = diag(λ1 , . . . , λn ) (the λi are eigenvalues of A), and
N ∈ Matn (C) is strictly upper triangular matrix. Furthermore, Q can be chosen such that
the eigenvalues λi appear in any order along the diagonal. [GVL]
915
REFERENCES
[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns
Hopkins University Press, London, 1996.
216.11 antipodal
Definition Suppose x and y are points on the n-sphere S n . If x = −y then x and y are called
antipodal points. The antipodal map is the map A : S n → S n defined as A(x) = −x.
Properties
REFERENCES
1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974.
It is clear that for real matrices, the conjugate transpose coincides with the transpose.
Properties
1. If A and B are complex matrices of same dimension, and α, β are complex constants,
then
916
2. If A and B are complex matrices such that AB is defined, then
(AB)∗ = B ∗ A∗ .
det(A∗ ) = det A,
trace(A∗ ) = trace A,
(A∗ )−1 = (A−1 )∗ ,
where trace and det are the trace and the determinant operators, and −1 is the inverse
operator.
4. Suppose h·, ·i is the standard inner product on Cn . Then for an arbitrary complex
n × n matrix A, and vectors x, y ∈ Cn , we have
Notes
The conjugate transpose of A is also called the adjoint matrix of A, the Hermitian
conjugate of A (whence one usually writes A∗ = A H ). The notation A† is also used for the
conjugate transpose [2]. In [1], A∗ is also called the tranjugate of A.
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965.
See also
theorem:A ∈ Cn×n is a normal matrix if and only if there exists a unitary matrix Q ∈ Cn×n
such that QH AQ = diag(λ1 , . . . , λn )(the diagonal matrix) where H is the conjugate transpose.
[GVL]
917
proof: Firstly we show that if there exists a unitary matrix Q ∈ Cn×n such that QH AQ =
diag(λ1 , . . . , λn ) then A ∈ Cn×n is a normal matrix. Let D = diag(λ1 , . . . , λn ) then A may
be written as A = QDQH . Verifying that A is normal follows by the following observation
AAH = QDQH QDH QH = QDDH QH and AH A = QDH QH QDQH = QDH DQH . There-
fore A is normal matrix because DD H = diag(λ1 λ¯1 , . . . , λn λ¯n ) = D H D.
Secondly we show that if A ∈ Cn×n is a normal matrix then there exists a unitary ma-
trix Q ∈ Cn×n such that QH AQ = diag(λ1 , . . . , λn ). By Schur decompostion we know
that there exists a Q ∈ Cn×n such that QH AQ = T (T is an upper triangular matrix).
Since A is a normal matrix then T is also a normal matrix. The result that T is a di-
agonal matrix comes from showing that a normal upper triagular matrix is diagonal (see
theorem for normal triangular matrices).
QED
REFERENCES
[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns
Hopkins University Press, London, 1996.
216.14 covector
If V is a vector space over a field k, then a covector is a linear map α : V → k, that is, and el-
ement of the dual space to V . Thus, for example, a covector field on a differentiable manifold
is a synonym for a 1-form.
Definition [1, 2] Let A be a square matrix (with entries in any field). If all off-diagonal entries
of A are zero, then A is a diagonal matrix.
From the definition, we see that an n × n diagonal matrix is completely determined by the
n entries on the diagonal; all other entries are zero. If the diagonal entries are a1 , a2 , . . . , an ,
918
then we denote the corresponding diagonal matrix by
a1 0 0 · · · 0
0 a2 0 · · · 0
0 0 a3 · · · 0
diag(a1 , . . . , an ) = .
.. .. .. . .
. . . .
0 0 0 an
Examples
1. The identity matrix and zero matrix are diagonal matrices. Also, any 1 × 1 matrix is
a diagonal matrix.
2. A matrix A is a diagonal matrix if and only if A is both an upper and lower triangular matrix.
Properties
1. If A and B are diagonal matrices of same order, then A + B and AB are again a
diagonal matrix. Further, diagonal matrices commute, i.e., AB = BA. It follows that
real (and complex) diagonal matrices are normal matrices.
2. A square matrix is diagonal if and only if it is triangular and normal (see this page).
Remarks
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. Wikipedia, diagonal matrix.
919
216.16 diagonalization
Next, we give necessary and sufficient conditions for T to be diagonalizable. For λ ∈ K set
Eλ = {u ∈ V : T u = λu}.
The set Eλ is a subspace of V called the eigenspace associated to λ. This subspace is
non-trivial if and only if λ is an eigenvalue of T .
Proposition 6. A transformation is diagonalizable if and only if
X
dim V = dim Eλ ,
λ
There are two fundamental reasons why a transformation T can fail to be diagonalizable.
1. The characteristic polynomial of T does not factor into linear factors over K.
2. There exists an eigenvalue λ, such that the kernel of (T − λI)2 is strictly greater than
the kernel of (T − λI). Equivalently, there exists an invariant subspace where T acts
as a nilpotent transformation plus some multiple of the identity.
Let A be a square matrix (possibly complex) of dimension n with entries aij . Then A is said
to be diagonally dominant if
n
X
|aii | > |aij |
j=1,j6=i
for i from 1 to n.
In addition A is said to be strictly diagonally dominant if
n
X
|aii | > |aij |
j=1,j6=i
920
for i from 1 to n.
This definition raises several natural questions, among them: Does any matrix of complex
numbers have eigenvalues? How many different eigenvalues can a matrix have? Given a
matrix, how does one compute its eigenvalues?
The answers to the above questions are usually studied in introductory linear algebra courses,
usually in the following sequence:
where I denotes the n × n identity matrix and det is the determinant function.
• Basic facts about the determinant imply that det(λI −A) is a polynomial in λ of degree
n. This is often referred to as the characteristic polynomial of A. (Note: some define
the characteristic polynomial to be det(A − λI); for the purposes of finding eigenvalues
of A it makes no difference.)
• From the fundamental theorem of algebra we know that any polynomial with complex
coefficients has at least one complex root, and at most n complex roots.
• It follows that any matrix of complex numbers A has at least one eigenvalue, and at
most n eigenvalues.
If one is given a n × n matrix A of real numbers, the above argument implies that A has
at least one complex eigenvalue; the question of whether or not A has real eigenvalues is
more subtle since there is no real-numbers analogue of the fundamental theorem of algebra.
It should not be a surprise then that some real matrices do not have real eigenvalues. For
example, let
0 −1
A= .
1 0
In this case det(λI − A) = λ2 + 1; clearly no real number λ satisfies λ2 + 1 = 0; hence A has
no real eigenvalues (although A has complex eigenvalues i and −i).
921
If one converts the above theory into an algorithm for calculating the eigenvalues of a matrix
A, one is led to a two-step procedure:
• Solve det(λI − A) = 0.
Remark: The definition of an eigenvalue for a endomorphism can be found here. The present
entry is an extract of version 6 of that entry.
The eigenvalue problem appears as part of the solution in many scientific or engineering
applications. An example of where it arises is the determination of the main axes of a
second order surface Q = xT Ax = 1 (with a symmetric matrix A). The task is to find the
places where the normal
∂Q ∂Q
∂(Q) = ,··· , = 2Ax
∂x1 ∂xn
[picture to go here]
Ax = λx, or (A − λI)x = 0
with I the identity matrix, with an arbitrary square matrix A, an unknown scalar λ, and the
unknown vector x. A non-trivial solution to this system of n linear homogeneous equations
922
exists if and only if the determinant
a11 − λ a · · · a
12 1n
a21 a22 − λ · · · a2n
det(A − λI) = .. .. .. .. = 0
. . . .
an1 an2 · · · ann − λ
This nth degree polynomial in λ is called the characteristic equation. Its roots λ are called
the eigenvalues and the corresponding vectors x eigenvectors. In the example, x is a right
eigenvector for λ; a left eigenvector y is defined by y T A = µy T .
Solving this polynomial for λ is not a practical method to solve the eigenvalue problem; a
QR-based method is a much more adequate tool ([Golub89]); it works as follows:
λ1 X X · · · X X
0 λ2 X · · · X X
0 0 λ3 · · · X X
.. .. .. . . .. ..
. . . . . .
0 0 0 · · · λn−1 X
0 0 0 ··· 0 λn
923
λ1 0 0 ··· 0 0
0 λ2 0 ··· 0 0
0 0 λ3 ··· 0 0
.. .. .. .. .. ..
. . . . . .
0 0 0 · · · λn−1 0
0 0 0 ··· 0 λn
References
Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John
Hopkins University Press, 1989.
4. All eigenvalues have unit modulus. In other words, if λ is an eigenvalue, then |λ| = 1.
Here, |λ| is the complex modulus of λ.
924
For property (3), suppose n is odd. Then A has an odd number of eigenvalues (including
multiplicities) that all satisfy property (2). Therefore, there must exist at least one eigenvalue
λ, such that λ = 1/λ. Then λ = 1/λ = λ/|λ|2. Taking the modulus of both sides, and
multiplying by |λ| = 6 0, gives |λ| = 1. For part (4), let λ be an eigenvalue corresponding to
an (column) eigenvector x, i.e., Ax = λx. Taking the complex transpose gives x∗ A∗ = λx∗ .
Here x∗ is the row vector corresponding to x, where each entry in complex conjugated. Also,
A∗ = A T , where AT is the transpose of A and A is the complex conjugate of A. Since A is
real, we then have x∗ A T = λx∗ . Thus
x∗ x = x∗ A T Ax = λλx∗ x.
These results can be found in [1] (page 268). In the same reference, it is mentioned that
properties (1) and (3) can essentially be found in a paper published in 1854 by Francesco
Brioschi (1824-1897). Later in the same year, an improved proof was given by F. Faà di
Bruno (1825-1888) [1]. Bibliographies of Brioschi and Faà di Bruno can be found at the
MacTutor History of Mathematics archive [2, 3].
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. The MacTutor History of Mathematics archive, entry on Francesco Brioschi
3. The MacTutor History of Mathematics archive, entry on Francesco Faà di Bruno.
216.21 eigenvector
Ax = λx
One can find eigenvectors by first finding eigenvalues, then for each eigenvalue λi , solving
the system
(A − λi I)xi = 0
925
to find a form which characterizes the eigenvector xi (any multiple of xi is also an eigenvec-
tor). Of course this is not the smart way to do it; for this, see singular value decomposition.
In this entry we construct the free vector space over a set, or the vector space gen-
erated by a set [1]. For a set X, we shall denote this vector space by C(X). One appli-
cation of this construction is given in [2], where the free vector space is used to define the
tensor product for modules.
To define the vector space C(X), let us first define C(X) as a set. For a set X and a field
K, we define
In other words, C(X) consists of functions f : X → K that are non-zero only at finitely
many points in X. Here, we denote the identity element in K by 1, and the zero element
by 0. The vector space structure for C(X) is defined as follows. If f and g are functions
in C(X), then f + g is the mapping x 7→ f (x) + g(x). Similarly, if f ∈ C(X) and α ∈ K,
then αf is the mapping x 7→ αf (x). It is not difficult to see that these operations are well
defined, i.e., both f + g and αf are again functions in C(X).
926
Here, the space span{∆a }a∈X consists of all finite linear combinations of elements in {∆a }a∈X .
It is clear that any element in span{∆a }a∈X is a member in C(X). Let us check the other
direction. Suppose f is a member in C(X). Then, let ξ1 , . . . , ξN be the distinct points in X
where f is non-zero. We then have
N
X
f = f (ξi)∆ξi ,
i=1
To see that the set {∆a }a∈X is linearly independent, we need to show that its any finite
subset is linearly independent. Let {∆ξ1 , . . . , ∆ξN } be such a finite subset, and suppose
PN
i=1 αi ∆ξi = 0 for some αi ∈ K. Since the points ξi are pairwise distinct, it follows that
αi = 0 for all i. This shows that the set {∆a }a∈X is linearly independent.
Let us define the mapping ι : X → C(X), x 7→ ∆x . This mapping gives a bijection between
X and the basis vectors {∆a }a∈X . We can thus identify these spaces. Then X becomes a
linearly independent basis for C(X).
REFERENCES
1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975.
2. I. Madsen, J. Tornehave, From Calculus to Cohomology, Cambridge University press, 1997.
927
216.24 in a vector space, λv = 0 if and only if λ = 0 or
v is the zero vector
Theorem Let V be a vector space over the field F . Further, let λ ∈ F and v ∈ V . Then
λv = 0 if and only if λ is zero, or if v is the zero vector, or if both λ and v are zero.
Proof. Let us denote by 0F and by 1F the zero respectively unit elements in F . Similarly,
we denote by 0V the zero vector in V . Suppose λ = 0F . Then, by axiom 8, we have that
1F v + 0F v = 1F v,
for all v ∈ V . By axiom 6, there is an element in V that cancels 1F v. Adding this element
to both sides yields 0F v = 0V . Next, suppose that v = 0V . We claim that λ0V = 0V for all
λ ∈ F . This follows from the previous claim if λ = 0, so let us assume that λ 6= 0F . Then
λ−1 exists, and axiom 7 implies that
v + λ0V = v
for all v ∈ V . Thus λ0V satisfies the axiom for the zero vector, and λ0V = 0V for all λ ∈ F .
For the other direction, suppose λv = 0V and λ 6= 0F . Then, using axiom 3, we have that
v = 1F v = λ−1 λv = λ−1 0V = 0V .
On the other hand, suppose λv = 0V and v 6= 0V . If λ 6= 0, then the above calculation for v
is again valid whence
0V 6= v = 0V ,
which is a contradiction, so λ = 0. 2
REFERENCES
1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975.
928
216.25 invariant subspace
The general problem to be solved by the least squares method is this: given some direct
measurements y of random variables, and knowing a set of equations f which have to be
satisfied by these measurements, possibly involving unknown parameters x, find the set of x
which comes closest to satisfying
f (x, y) = 0
X
∆y 2 = ∆y T ∆y = ||∆y||2 = ∆yi2
i
The assumption has been made here that the elements of y are statistically uncorrelated and
have equal variance. For this case, the above solution results in the most efficent estimators
for x, ∆y. If the y are correlated, correlations and variances are defined by a covariance
matrix C, and the above minimum condition becomes
∆y T C −1 ∆y is minimized
Least squares solutions can be more or less simple, depending on the constraint equations
f . If there is exactly one equation for each measurement, and the functions f are linear in
the elements of y and x, the solution is discussed under linear regression. For other linear
models, see linear least squares. Least squares methods applied to few parameters can lend
929
themselves to very efficient algorithms (e.g. in real-time image processing), as they reduce
to simple matrix operations.
If the constraint equations are non-linear, one typically solves by linearization and in iterations,
using approximate values of x, ∆y in every step, and linearizing by forming the matrix of
derivatives , df /dx (the Jacobian matrix) and possibly also df /dy at the last point of ap-
proximation.
Note that as the iterative improvements δx, δy tend towards zero (if the process converges),
∆y converges towards a final value which enters the minimum equation above.
Algorithms avoiding the explicit calculation of df /dx and df /dy have also been investigated,
e.g. [1]; for a discussion, see [2]. Where convergence (or control over convergence) is prob-
lematic, use of a general package for minimization may be indicated.
REFERENCES
1. M.L. Ralston and R.I. Jennrich, Dud, a Derivative-free Algorithm for Non-linear Least Squares,
Technometrics 20-1 (1978) 7.
2. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C,
Second edition, Cambridge University Press, 1995.
Note: This entry is based on content from the The Data Analysis Briefbook
Linear algebra is the branch of mathematics devoted to the theory of linear structure. The
axiomatic treatment of linear structure is based on the notions of a linear space (more com-
monly known as a vector space), and a linear mapping. Broadly speaking, there are two
fundamental questions considered by linear algebra:
From the geometric point of view, “linear” is synonymous with “straight”, and consequently
linear algebra can be regarded as the branch of mathematics dealing with lines and planes,
as well as with transformations of space that preserve “straightness”, e.g. rotations and
reflections. The two fundamental questions, in geometric terms, deal with
930
• the intersection of hyperplanes, and
• the principle axes of an ellipsoid.
Linearity is a very basic notion, and consequently linear algebra has applications in numerous
areas of mathematics, science, and engineering. Diverse disciplines, such as differential equations,
differential geometry, the theory of relativity, quantum mechanics, electrical circuits, com-
puter graphics, and information theory benefit from the notions and techniques of linear
algebra.
Euclidean geometry is related to a specialized branch of linear algebra that deals with linear
measurement. Here the relevant notions are length and angle. A typical question is the de-
termination of lines perpendicular to a given plane. A somewhat less specialized branch deals
with affine structure, where the key notion is that of area and volume. Here determinants
play an essential role.
Yet another branch of linear algebra is concerned with computation, algorithms, and numer-
ical approximation. Important examples of such techniques include: Gaussian elimination,
the method of least squares, LU factorization, QR decomposition, Gram-Schmidt orthogonalization,
singular value decomposition, and a number of iterative algorithms for the calculation of
eigenvalues and eigenvectors.
Syllabus.
The following subject outline is meant to serve as a survey of some key topics in linear algebra
(Warning: the choice of topics is far from comprehensive, and no doubt reflects the biases
of the present author’s background). As such, it may (or may not) be of use to motivated
auto-didacts interested in deepening their understanding of the subject.
1. Linear structure.
(a) Introduction: systems of linear equations, Gaussian elimination, matrices, matrix operations.
(b) Foundations: fields and vector spaces, subspace, linear independence, basis,
dimension, direct sum decomposition.
(c) Linear mappings: linearity axioms, kernels and images, injectivity, surjectiv-
ity, bijections, compositions, inverses, matrix representations, change of basis,
conjugation, similarity.
2. Affine structure.
(a) Determinants: characterizing properties, cofactor expansion, permutations, Cramer’s rule,
classical adjoint.
(b) Geometric aspects: Euclidean volume, orientation, equiaffine transformations,
determinants as geometric invariants of linear transformations.
3. Diagonalization.
931
(a) Basic notions: eigenvector, eigenvalue, eigenspace, characteristic polynomial.
(b) Obstructions: imaginary eigenvalues, nilpotent transformations, classification
of 2-dimensional real transformations.
(c) Structure theory: invariant subspaces, Cayley-Hamilton theorem, Jordan canonical form,
rational canonical form.
4. multi-linearity.
(a) Foundations: inner product axioms, the adjoint operation, symmetric transfor-
mations, skew-symmetric transformations, self-adjoint transformations, normal
transformations.
(b) Spectral theorem: diagonalization of self-adjoint transformations, diagonaliza-
tion of quadratic forms.
Ax ≈ b
where ≈ stands for the best approximate solution in the least squares sense, i.e. we want to
minimize the Euclidean norm of the residual r = Ax − b
932
" m #1/2
X
||Ax − b||2 = ||r||2 = ri2
i=1
Among the different methods to solve this problem, we mention normal equations, sometimes
ill-conditioned, QR decomposition, and, most generally, singular value decomposition. For
further reading, [Golub89], [Branham90], [Wong92], [Press95].
Example: Let us consider the problem of finding the closest point (vertex) to measurements
on straight lines (e.g. Trajectories emanating from a particle collision). This problem can
be described by Ax = b with
a11 a12 b1
..
A= . .. ; x = u ; b = ..
. v
.
am1 am2 bm
This is clearly an inconsistent system of linear equations, with more equations than un-
knowns, a frequently occurring problem in experimental data analysis. The system is, how-
ever, not very inconsistent and there is a point that lies “nearly” on all straight lines. The
solution can be found with the linear least squares method, e.g. by QR decomposition for
solving Ax = b:
QRx = b → x = R−1 QT b
References
Wong92 S.S.M. Wong, Computational Methods in Physics and Engineering, Prentice Hall, 1992.
Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John
Hopkins University Press, 1989.
Branham90 R.L. Branham, Scientific Data Analysis, An Introduction to Overdetermined Systems,
Springer, Berlin, Heidelberg, 1990.
Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes
in C, Second edition, Cambridge University Press, 1995. (The same book exists for
the Fortran language). There is also an Internet version which you can work from.
933
216.29 linear manifold
Definition [1] Suppose V is a vector space and suppose that L is a non-empty subset of V .
If there exists a v ∈ V such that L + v = {v + l | l ∈ L} is a vector subspace of V , then L
is a linear manifold of V . Then we say that the dimension of L is the dimension of L + v
and write dim L = dim(L + v). If dim L = dim V − 1, then L is called a hyperplane.
A linear manifold is, in other words, a linear subspace that has possibly been shifted away
from the origin. For instance, in R2 examples of linear manifolds are points, lines (which are
hyperplanes), and R2 itself.
REFERENCES
1. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977.
934
Example 2. If A is diagonalizable, i.e., of the form A = LDL−1 , where D is a diagonal matrix,
then
X∞
A 1
e = (LDL−1 )k
k=0
k!
X∞
1
= LD k L−1
k=0
k!
= LeD L−1 .
For diagonalizable matrix A, it follows that det eA = etrace A . However, this formula is, in
fact, valid for all A.
properties
Let A be a square n × n real valued matrix. Then the matrix exponential satisfies the
following properties
4. The trace of A and the determinant of eA are related by the by the formula
det eA = etrace A .
935
• The 2 × 3 matrix
1 −2 2
A=
1 0 2
• The 3 × 3 matrix
1 0 1
B = 3 −1 5
4 2 0
• The 3 × 1 matrix
7
C = 7
2
• The 1 × 2 matrix
1
D= 2
2
All of our example matrices (except the last one) have entries which are integers. In general,
matrices are allowed to have their entries taken from any ring R. The set of all m × n
matrices with entries in a ring R is denoted Mm×n (R). If a matrix has exactly as many rows
as it has columns, we say it is a square matrix.
Addition of two matrices is allowed provided that both matrices have the same number of
rows and the same number of columns. The sum of two such matrices is obtained by adding
their respective entries. For example,
7 −1 7 + (−1) 6
7 + 4.5 = 7 + 4.5 = 11.5
2 0 2+0 2
Multiplication of two matrices is allowed provided that the number of columns of the first
matrix equals the number of rows of the second matrix. (For example, multiplication of a
2 × 3 matrix with a 3 × 3 is allowed, but multiplication of a 3 × 3 matrix with a 2 × 3 matrix
is not allowed, since the first matrix has 3 columns, and the second matrix has 2 rows, and
3 doesn’t equal 2.) In this case the matrix multiplication is defined by
X
(AB)ij = (A)ik (B)kj .
k
be the two matrices that we used above as our very first two examples of matrices. Since A
is a 2 × 3 matrix, and B is a 3 × 3 matrix, it is legal to multiply A and B, but it is not legal
936
to multiply B and A. The method for computing the product AB is to place A below and
to the right of B, as follows:
1 0 1
3 −1 5
4 2 0
1 −2 2 X Y
1 0 2
A is always in the bottom left corner, B is in the top right corner, and the product, AB, is
always in the bottom right corner. We see from the picture that AB will be a 2 × 3 matrix.
(In general, AB has as many rows as A, and as many columns as B.)
Let us compute the top left entry of AB, denoted by X in the above picture. The way to
calculate this entry of AB (or any other entry) is to take the dot product of the stuff above
it [which is (1, 3, 4)] and the stuff to the left of it [which is (1, −2, 2)]. In this case, we have
X = 1 · 1 + 3 · (−2) + 4 · 2 = 3.
Similarly, the top middle entry of AB (where the Y is in the above picture) is gotten by
taking the dot product of the stuff above it: (0, −1, 2), and the stuff to the left of it: (1, −2, 2),
which gives
Y = 0 · 1 + (−1) · (−2) + 2 · 2 = 6
Continuing in this way, we can compute every entry of AB one by one to get
1 0 1
3 −1 5
4 2 0
1 −2 2 3 6 −9
1 0 2 9 4 1
and so
3 6 −9
AB = .
9 4 1
If one tries to compute the illegal product BA using this procedure, one winds up with
1 −2 2
1 0 2
1 0 1 ?
3 −1 5
4 2 0
The top left entry of this illegal product (marked with a ? above) would have to be the dot
product of the stuff above it: (1, 1), and the stuff to the left of it: (1, 0, 1), but these vectors
do not have the same length, so it is impossible to take their dot product, and consequently
it is impossible to take the product of the matrices BA.
937
Under the correspondence of matrices and linear transformations, one can show that matrix
multiplication is equivalent to composition of linear transformations, which explains why
matrix multiplication is defined in a manner which is so odd at first sight, and why this
strange manner of multiplication is so useful in mathematics.
Conversely, suppose that all eigenvalues of A are zero. Then the chararacteristic polynomial
of A: det(λI − A) = λn . It now follows from the Cayley-Hamilton theorem that An = 0.
Since the determinant is the product of the eigenvalues it follows that a nilpotent matrix has
determinant 0. Similarly, since the trace of a square matrix is the sum of the eigenvalues, it
follows that it has trace 0.
One class of nilpotent matrices are the strictly triangular matrices (lower or upper), this
follows from the fact that the eigenvalues of a triangular matrix are the diagonal elements,
and thus are all zero in the case of strictly triangular matrices.
Also it’s worth noticing that any matrix that is similar to a nilpotent matrix is nilpotent.
938
and a signature
ni = ni−1 + di + di+1 + . . . + dk , i = 1, . . . , k
A non-zero vector in a vector space V is a vector that is not equal to the zero vector in
V.
939
216.35 off-diagonal entry
Let A = (aij ) be a square matrix. An element aij is an off-diagonal entry if aij is not on
the diagonal, i.e., if i 6= j.
Orthogonal matrices play a very important role in linear algebra. Inner products are pre-
served under an orthogonal transform: (Qx)T Qy = xT QT Qy = xT y, and also the Euclidean norm
||Qx||2 = ||x||2 . An example of where this is useful is solving the least squares problem
Ax ≈ b by solving the equivalent problem QT Ax ≈ QT b.
Orthogonal matrices can be thought of as the real case of unitary matrices. A unitary
matrix U ∈ Cn×n has the property U ∗ U = I, where U ∗ = U t (the conjugate transpose).
Since Qt = Qt for real Q, orthogonal matrices are unitary.
Important orthogonal matrices are Givens rotations and Householder transformations. They
help us maintain numerical stability because they do not amplify rounding errors.
cos(α) sin(α) cos(α) sin(α)
or
− sin(α) cos(α) sin(α) − cos(α)
respectively.
REFERENCES
1. Friedberg, Insell, Spence. Linear Algebra. Prentice-Hall Inc., 1997.
This entry is based on content from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html)
940
216.37 orthogonal vectors
Two vectors, v1 and v2 , are orthogonal if and only if their inner product hx, yiis 0. In two
dimensions, orthogonal vectors are perpendicular (or in n dimensions in the plane defined
by the two vectors.)
A set of vectors is orthogonal when, taken pairwise, any two vectors in the set are orthogonal.
216.38 overdetermined
A partitioned matrix, or a block matrix, is a matrix M that has been constructed from
other smaller matrices. These smaller matrices are called blocks or sub-matrices of M.
941
If A1 , . . . , An are square matrices (of possibly different dimensions), then we define the
direct sum of the matrices A1 , . . . , An as the partitioned matrix
A1
..
diag(A1 , . . . , An ) = . ,
An
where the off-diagonal blocks are zero.
We begin by showing that the theorem is true if the characteristic polynomial does not have
repeated roots, and then prove the general case.
Suppose then that the discriminant of the characteristic polynomial is non-zero, and hence
that T : V → V has n = dim V distinct eigenvalues once we extend 1 to the algebraic closure
1
Technically, this means that we must work with the vector space V̄ = V ⊗ k̄, where k̄ is the
algebraic closure of the original field of scalars, and with T̄ : V̄ → V̄ the extended automorphism with
action
T̄ (v ⊗ a) → T (V ) ⊗ a, v ∈ V, a ∈ k̄.
942
of the ground field. We can therefore choose a basis of eigenvectors, call them v1 , . . . , vn , with
λ1 , . . . , λn the corresponding eigenvalues. From the definition of characteristic polynomial
we have that n
Y
cT (x) = (x − λi ).
i=1
cT (T )vi = 0
To prove the general case, let δ(p) denote the discriminant of a polynomial p, and let us
remark that the discriminant mapping
T 7→ δ(cT ), T ∈ End(V )
is polynomial on End(V ). Hence the set of T with distinct eigenvalues is a dense open subset
of End(V ) relative to the Zariski topology. Now the characteristic polynomial map
T 7→ cT (T ), T ∈ End(V )
is a polynomial map on the vector space End(V ). Since it vanishes on a dense open subset,
it must vanish identically. Q.E.D.
The columns of the unitary matrix Q in Schur’s decomposition theorem form an orthonormal basis
of Cn . The matrix A takes the upper-triangular form D + N on this basis. Conversely, if
v1 , . . . , vn is an orthonormal basis for which A is of this form then the matrix Q with vi as
its i-th column satisfies the theorem.
By induction there is an orthonormal basis v2 , . . . , vn of V for which (1−π)A takes the desired
form on V . Now A = πA + (1 − π)A so Avi ⇔ (1 − π)Avi ( (mod#1)) for i ∈ {2, . . . , n}.
Then v, v2 , . . . , vn can be used as a basis for the Schur decomposition on Cn .
943
216.43 singular value decomposition
A = USV T
σ1 ≥ σ2 ≥ · · · ≥ σmin(m,n) ≥ 0
The σi are the singular values of A and the first min(m, n) columns of U and V are the
left and right (respectively) singular vectors of A. S has the form:
Σ
if m ≥ n and Σ 0 if m < n,
0
σ1 ≥ σ2 ≥ · · · ≥ σr > σr+1 = · · · = σn = 0.
The SVD provides a numerically robust solution to the least-squares problem. The matrix-
algebraic phrasing of the least-squares solution x is
x = (AT A)−1 AT b
x = V Σ−1 0 U T b.
References
944
216.44 skew-symmetric matrix
Definition:
Let A be an square matrix of dimension n × n with real entries (aij ). The matrix A is
skew-symmetric if aij = −aji for all 1 6 i 6 n, 1 6 j 6 n.
a11 = 0 · · · a1n
..
A = ... ..
. .
an1 · · · ann = 0
The main diagonal entries are zero because ai,i = −ai,i implies ai,i = 0.
One can see skew-symmetric matrices as a special case of complex skew-Hermitian matrices.
Thus, all properties of skew-Hermitian matrices also hold for skew-symmetric matrices.
Properties:
Examples:
0 b
•
−b 0
0 b c
• −b 0 e
c −e 0
945
216.45 square matrix
Examples:
0.94 0.37 0.71 0.32 0.58
1.00000 0.50000 0.33333 0.25000
90 0.16 0.74 0.83 0.27 89 38 50
0.50000 0.33333 0.25000 0.20000
1 0.49 0.550. 64 26 98
33333 0.25000 0.20000 0.16667 50 0.03 0.07
15 0.59 0.43 0.03 0.76 40 96 83
25000 0.20000 0.0.16667 0.14286
04 0.64 0.61 0.17 0.29
The notation Matn (K) is often used to signify the standard class of square matrices which
are of dimension n × n with elements draw from a field K. Thus, one would use a ∈ Mat3 (C)
to declare that a is a three-by-three matrix with elements that are complex numbers.
property: Suppose A and B are matrices such that AB is a square matrix. Then the
product BA is defined and also a square matrix.
A strictly upper triangular matrix is an upper triangular matrix which has 0 on the
main diagonal. Similarly A strictly lower triangular matrix is an upper triangular
matrix which has 0 on the main diagonal. i.e.
0 a12 a13 · · · a1n
0 0 a23 · · · a2n
0 0 0 · · · a3n
.. .. .. .. ..
. . . . .
0 0 0 ··· 0
946
0 0 0 ··· 0
a21 0 0 · · · 0
a31 a32 0 · · · 0
.. .. .. . . ..
. . . . .
an1 an2 an3 ··· 0
Definition:
Let A be a square matrix of dimension n. The matrix A is symmetric if aij = aji for all
1 6 i 6 n, 1 6 j 6 n.
a11 · · · a1n
..
A = ... ..
. .
an1 · · · ann
properties:
Examples:
a b
•
b c
a b c
• b d e
c e f
Theorem ([1], pp. 82) Let A be a complex square matrix. Then A is diagonal if and only
if A is normal and triangular.
947
Proof. If A is a diagonal matrix, then the complex conjugate A∗ is also a diagonal matrix.
Since arbitrary diagonal matrices commute, it follows that A∗ A = AA∗ . Thus any diagonal
matrix is a normal triangular matrix.
Next, suppose A = (aij ) is an normal upper triangular matrix. Thus aij = 0 for i > j, so for
the diagonal elements in A∗ A and AA∗ , we obtain
i
X
∗
(A A)ii = |aki|2 ,
k=1
Xn
(AA∗ )ii = |aik |2 .
k=i
For i = 1, we have
|a11 |2 = |a11 |2 + |a12 |2 + · · · + |a1n |2 .
It follows that the only non-zero entry on the first row of A is a11 . Similarly, for i = 2, we
obtain
|a12 |2 + |a22 |2 = |a22 |2 + · · · + |a2n |2 .
Since a12 = 0, it follows that the only non-zero element on the second row is a22 . Repeating
this argumentation for all rows, we see that A is a diagonal matrix. Thus any normal upper
triangular matrix is a diagonal matrix.
Suppose then that A is a normal lower triangular matrix. Then it is not difficult to see that
A∗ is a normal upper triangular matrix. Thus, by the above, A∗ is a diagonal matrix, whence
also A is a diagonal matrix. 2
REFERENCES
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So-
ciety, 1994.
948
a11 a12 a13 ··· a1n
0 a22 a23 ··· a2n
0 0 a33 ··· a3n
.. .. .. .. ..
. . . . .
0 0 0 · · · ann
a11 0 0 ··· 0
a21 a22 0 ··· 0
a31 a32 a33 ··· 0
.. .. .. .. ..
. . . . .
an1 an2 an3 · · · ann
Triangular matrices allow numerous algorithmic shortcuts in many situations. For example,
Ax = b can be solved in n2 operations if A is an n × n triangular matrix.
Triangular matrices have the following properties (prefix ”triangular” with either ”upper”
or ”lower” uniformly):
The last two properties follow easily from the cofactor expansion of the triangular matrix.
949
d1 u 1 0 0 ··· 0
l1 d2 u2 0 ··· 0
0 l2 d3 u3 · · · 0
.. .. . . .. .. ..
. . . . . .
0 0 · · · ln−2 dn−1 un−1
0 0 ··· 0 ln−1 dn
An under determined system of linear equations has more unknowns than equations. It
can be consistent with infinitely many solutions, or have no solution.
1 a12 a13 · · · a1n
0 1 a23 · · · a2n
0 0 1 · · · a3n
.. .. .. .. ..
. . . . .
0 0 0 ··· 1
1 0 0 ··· 0
a21 1 0 · · · 0
a31 a32 1 · · · 0
.. .. .. . . ..
. . . . .
an1 an2 an3 ··· 1
950
216.53 unitary
Definitions. A unitary space V is a complex vector space with a distinguished positive definite
Hermitian form,
h−, −i : V × V → C,
which serves as the inner product on V .
Remarks.
2. Unitary transformations and unitary matrices are closely related. On the one hand,
a unitary matrix defines a unitary transformation of Cn relative to the inner product
(216.53.2). On the other hand, the representing matrix of a unitary transformation
relative to an orthonormal basis is, in fact, a unitary matrix.
3. A unitary transformation is an automorphism. This follows from the fact that a unitary
transformation T preserves the inner-product norm:
kT uk = kuk, u ∈ V. (216.53.3)
Hence, if
T u = 0,
then by the definition (216.53.1) it follows that
kuk = 0,
u = 0.
951
4. Indeed, relation (216.53.3) can be taken as the definition of a unitary transformation.
This follows from the polarization identity for sesquilinear forms, namely
Thanks to the polarization identity, it is possible to show that if T preserves the norm,
then (216.53.1) must hold as well.
5. A simple example of a unitary matrix is the change of coordinates matrix between two
orthonormal bases. Indeed, let u1 , . . . , un and v1 , . . . , vn be two orthonormal bases, and
let A = (Aij ) be the corresponding change of basis matrix defined by
X
vj = Aij ui , j = 1, . . . , n.
i
Substituting the above relation into the defining relations for an orthonormal basis,
hui, uj i = δij ,
hvk , vl i = δkl ,
we obtain X X
δij Aik Ajl = Aik Ail = δkl .
ij i
AĀt = I,
as desired.
Let F be a field. A vector space V over F is a set with two binary operations, + : V ×V −→ V
and · : F × V −→ V , such that
952
1. (a + b) + c = a + (b + c) for all a, b, c ∈ V
2. a + b = b + a for all a, b ∈ V
6. 1 · v = v for all v ∈ V
The elements of V are called vectors, and the element 0 ∈ V is called the zero vector of V .
Definition Let V be a vector space over a field F , and let W be a nonempty subset of V .
If W is a vector space, then W is a vector subspace of V .
Examples
1. Every vector space contain two trivial vector subspaces: the entire vector space, and
the zero vector space.
2. If S and T are vector subspaces of a vector space V, then the vector sum
S + T = {s + t ∈ V | s ∈ S, t ∈ T }
953
Results for vector subspaces
Theorem 2 [2] (Dimension theorem for subspaces) Let V be a finite dimensional vector
space with subspaces S and T . Then
\
dim(S + T ) + dim(S T ) = dim S + dim T.
Theorem 3 ([1], page 42) (Dimension theorem for a composite mapping) Suppose U, V, W
are finite dimensional vector spaces, and L : U → V and M : V → W are linear mappings.
Then
\
dim Img L Ker M = dim Img L − dim Img ML
= dim Ker ML − dim Ker L.
REFERENCES
1. S. Lang, Linear Algebra, Addison-Wesley, 1966.
2. W.E. Deskins, Abstract Algebra, Dover publications, 1995.
3. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So-
ciety, 1994.
Definition Suppose X is a set, and Y is a vector space with zero vector 0. If T is a map
T : X → Y , such that T (x) = 0 for all x in X, then T is a zero map.
Examples
2. If X is the zero vector space, any linear map T : X → Y is a zero map. In fact,
T (0) = T (0 · 0) = 0T (0) = 0.
954
216.57 zero vector in a vector space is unique
Proof. Suppose 0 and 0̃ are zero vectors in a vector space V . Then both 0 and 0̃ must
satisfy axiom 3, i.e., for all v ∈ V ,
v + 0 = v,
v + 0̃ = v.
0 = 0̃ + 0
= 0 + 0̃
= 0̃,
and 0 = 0̃. 2
Definition A zero vector space is a vector space that contains only one element, a
zero vector.
Properties
2. A vector space X is a zero vector space if and only if the dimension of X is zero.
3. Any linear map defined on a zero vector space is the zero map. If T is linear on {0},
then T (0) = T (0 · 0) = 0T (0) = 0.
955
Chapter 217
Because the Jordan decomposition of a circulant matrix is rather simple, circulant matrices
have some interest in connection with the approximation of eigenvalues of more general
matrices. In particular, they have become part of the standard apparatus in the computerized
analysis of signals and images.
956
217.2 matrix
A matrix is simply a mapping M : A × B → C of the product of two sets into some third
set. As a rule, though, the word matrix and the notation associated with it are used only in
connection with linear mappings. In such cases C is the ring or field of scalars.
Definition:Let V and W be finite-dimensional vector spaces over the same field k, with
bases A and B respectively, and let f : V → W be a linear mapping. For each a ∈ A let
(kab )b∈B be the unique family of scalars (elements of k) such that
X
f (a) = kab b .
b∈B
Then the family (Mab ) (or equivalently the mapping (a, b) 7→ Mab of A × B → k) is called
the matrix of f with respect to the given bases A and B. The scalars Mab are called the
components of the matrix.
of V , we have X
f (x) = Mab b
a∈A
as is readily verified.
Any two linear mappings V → W have a sum, defined pointwise; it is easy to verify that the
matrix of the sum is the sum, componentwise, of the two given matrices.
The formalism of matrices extends somewhat to linear mappings between modules, i.e.
extends to a ring k, not necessarily commutative, rather than just a field.
Suppose we are given three modules V, W, X, with bases A, B, C respectively, and two linear
mappings f : V → W and g : W → X. f and g have some matrices (Mab ) and (Nbc ) with
respect to those bases. The product matrix NM is defined as the matrix (Pac ) of the
function
x 7→ g(f (x))
V →W
with respect to the bases A and C. Straight from the definitions of a linear mapping and a
basis, one verifies that X
Pac = Mab Nbc (217.2.1)
b∈B
957
for all a ∈ A and c ∈ C.
To illustrate the notation of matrices in terms of rows and columns, suppose the spaces
V, W, X have dimensions 2, 3, and 2 respectively, and bases
A = {a1 , a2 } B = {b1 , b2 , b3 } C = {c1 , c2 } .
We write
N11 N12
M11 M12 M13 N21 N22 = P11 P12 .
M21 M22 M23 P21 P22
N31 N32
(Notice that we have taken a liberty with the notation, by writing e.g. M12 instead of
Ma1 a2 .) The equation (217.2.1) shows that the multiplication of two matrices proceeds
“rows by columns”. Also, in an expression such as N23 , the first index refers to the row, and
the second to the column, in which that component appears.
square matrix
The word matrix has come into use in some areas where linear mappings are not at issue.
An example would be a combinatorical statement, such as Hall’s marriage theorem, phrased
in terms of “0-1 matrices” instead of subsets of A × B.
958
Remark
Matrices are heavily used in the physical sciences, engineering, statistics, and computer pro-
gramming. But for purely mathematical purposes, they are less important than one might ex-
pect, and indeed are frequently irrelevant in linear algebra. Linear mappings, determinants,
traces, transposes, and a number of other simple notions can and should be defined without
matrices, simply because they have a meaning independant of any basis or bases. Many little
theorems in linear algebra can be proved in a simpler and more enlightening way without
matrices than with them. One more illustration: The derivative (at a point) of a mapping
from one surface to another is a linear mapping; it is not a matrix of partial derivatives,
because the matrix depends on a choice of basis but the derivative does not.
959
Chapter 218
Let f1 , f2 , f3 , ..., fn be real valued functions defined on some I ⊂ R. Then f1 , f2 , f3 , ..., fn are
said to be linearly dependent if for some a1 , a2 , a3 , ..., an ∈ R not all zero, we have that:
n
X
ai fi (x) = 0, ∀x ∈ I.
i=1
960
Chapter 219
The rank and signature of a quadratic form are invariant under change of basis.
In matrix terms, if A is a real symmetric matrix, there is an invertible matrix S such that
Is 0 0
SAS T = 0 −Ir−s 0
0 0 0
where r, the rank of A, and s, the signature of A, characterise the congruence class.
219.2 basis
It can be proved that any two bases of the same vector space must have the same cardinality.
This introduces the notion of dimension of a vector space, which is precisely the cardinality
of the basis, and is denoted by dim(V ), where V is the vector space.
The fact that every vector space has a Hamel basis is an important consequence of the axiom of choice
(in fact, that proposition is equivalent to the axiom of choice.)
961
Examples.
• β = {ei }, 1 ≤ i ≤ n, is a basis for Rn (the n-dimensional vector space over the reals).
For n = 4,
1 0 0 0
0 1 0
0
β= , , ,
0 0 1 0
0 0 0 1
•
1 0 0 1 0 0 0 0
β= , , ,
0 0 0 0 0 1 1 0
• The empty set is a basis for the trivial vector space which consists of the unique element
0.
u=v+w
then we say that V and W form a direct sum decomposition of U and write
U = V ⊕ W.
962
In such circumstances, we also say that V and W are complementary subspaces.
v1 , . . . , vm , w1 , . . . , wn
is a basis of U.
Let us also remark that direct sum decompositions of a vector space U are in a one-to
correspondence fashion with projections on U.
h, i : U × U → K.
219.4 dimension
Let V be a vector space over a field K. We say that V is finite-dimensional if there exists a
finite basis of V . Otherwise we call V infinite-dimensional.
It can be shown that every basis of V has the same cardinality. We call this cardinality the
dimension of V . In particular, if V is finite-dimensional, then every basis of V will consist
of a finite set v1 , . . . , vn . We then call the natural number n the dimension of V .
Next, let U ⊂ V a subspace. The dimension of the quotient vector space V /U is called the
codimension of U relative to V .
963
Note: in circumstances where the choice of field is ambiguous, the dimension of a vector
space depends on the choice of field. For example, every complex vector space is also a real
vector space, and therefore has a real dimension, double its complex dimension.
Every vector space has a basis. This result, trivial in the finite case, is in fact rather
surprising when one thinks of infinite dimension vector spaces, and the definition of a basis.
The theorem is equivalent to the axiom of choice family of axioms and theorems. Here we
will only prove that Zorn’s lemma implies that every vector space has a basis.
Let X be any vector space, and let A be the set of linear independent subsets of X. For
x, y ∈ A, we define x 6 y iff x ⊆ y. It it easy to see that this (the canonical order
S relation on
0
subsets) defines a partial order on A. For each chain C ∈ A, define C = C. Now any
finite collection of vectors from C 0 must lie in a single set c ∈ C 0 , and as such they are
linearly independent. This shows that C 0 ∈ A and thus C 0 is an upper bound for C.
According to Zorn’s lemma A now has a maximal element, ξ, which by definition is linearly
independant. But ξ is also a spanning
S set, for is this were not true let z ∈ X be any vector
not in the span of ξ. Then ξ {z} would again be a linearly independent∗) set larger than
ξ, contradicting that ξ is a maximal element. Thus ξ is a basis for X.
*) If not, let (xi ) be a finite collection of vectors from ξ so that a1 x1 +a2 x2 +· · ·+an xn +az z =
0, and not all ai are 0. W must then have az 6= 0, because if not we would have a non-trivial
linear combination of vectors from ξ equalling the zero vector, contrary to ξ being a linearly
independent set. But then
a1 a2 an
z=− x1 − x2 − · · · − xn ,
az az az
contrary to z not being in the span of ξ.
219.6 flag
V1 ⊂ V2 ⊂ · · · ⊂ Vn = V
964
is called a flag in V . We speak of a complete flag when
dim Vi = i
for each i = 1, . . . , n.
Next, putting
dk = dim Vk , k = 1, . . . n,
we say that a list of vectors (u1 , . . . , udn ) is an adapted basis relative to the flag, if the first
d1 vectors give a basis of V1 , the first d2 vectors give a basis of V2 , etc. Thus, an alternate
characterization of a complete flag, is that the first k elements of an adapted basis are a
basis of Vk .
219.7 frame
Introduction Frames and coframes are notions closely related to the notions of basis and
dual basis. As such, frames and coframes are needed to describe the connection between
list vectors and the more general abstract vectors.
Frames and bases. Let U be a finite-dimensional vector space over a field K, and let I be
a finite, totally ordered set of indices 1 , e.g. (1, 2, . . . , n). We will call a mapping F : I → U
a reference frame, or simply a frame. To put it plainly, F is just a list of elements of U with
indices belong to I. We will adopt a notation to reflect this and write Fi instead of F(i).
Subscripts are used when writing the frame elements because it is best to regard a frame as
a row-vector 2 whose entries happen to be elements of U, and write
F = (F1 , . . . , Fn ).
This is appropriate because every reference frame F naturally corresponds to a linear mapping
F̂ : KI → U defined by X
a 7→ ai Fi , a ∈ KI .
i∈I
1
It is advantageous to allow general indexing sets, because one can indicate the use of multiple frames
of reference by employing multiple, disjoint sets of indices.
2
It is customary to use superscripts for the components of a column vector, and subscripts for the
components of a row vector. This is fully described in the vector entry.
965
In other words, F̂ is a linear form on KI that takes values in U instead of K. We use row
vectors to represent linear forms, and that’s why we write the frame as a row vector.
In full duality to the custom of writing frames as row-vectors, we write the coframe as a
column vector whose components are the coordinate functions:
x1
x2
.. .
.
xn
3
Strictly speaking, we should be denote the coframe by xF and the the coordinate functions by xiF so as
to reflect their dependence on the choice of reference frame. Historically, writers have been loath to do this,
preferring a couple of different notational tricks to avoid ambiguity. The cleanest approach is to use different
symbols, e.g. xi versus yj , to distinguish coordinates coming from different frames. Another approach is to
use distinct indexing sets; in this way the indices themselves will indicate the choice of frame. Say we have
two frames F : I → U and G : J → U with I and J distinct finite sets. We stipulate that the symbol i
refers to elements of I and that j refers to elements of J, and write xi for coordinates relative to F and xj
for coordinates relative to G. That’s the way it was done in all the old-time geometry and physics papers,
and is still largely the way physicists go about writing coordinates. Be that as it may, the notation has it’s
problem and is the subject of long-standing controversy, named by mathematicians the debauche of indices.
The problem is that the notation employs the same symbol, namely x, to refer to two different objects,
namely a map with domain I and another map with domain J. In practice, ambiguity is avoided because
the old-time notation never refers to the coframe (or indeed any tensor) without also writing the indices.
This is the classical way of the dummy variable, a cousin to the f (x) notation. It creates some confusion for
beginners, but with a little practice it’s a perfectly serviceable and useful way to communicate.
966
We identify of F̂−1 and x with the above column-vector. This is quite natural because all of
these objects are in natural correspondence with a K-valued functions of two arguments,
U × I → K,
that maps an abstract vector u ∈ U and an index i ∈ I to a scalar xi (u), called the ith
component of u relative to the reference frame F.
M:I ×J →K
with entries
Mji = yj (Fi ), i ∈ I, j ∈ J.
An equivalent description of the transition matrix is given by
X j
yj = Mi xi , for all j ∈ J.
i∈I
It is also the custom to regard the elements of I as indexing the columns of the matrix, while
the elements of J label the rows. Thus, for I = (1, 2, . . . , n) and J = (1̄, 2̄, . . . , n̄), we can
write
M1̄1 . . . M1̄n y1̄
.. . . . .
. . .. = .. F1 . . . Fn .
Mn̄1 . . . Mn̄n yn̄
In this way we can describe the relation between coordinates relative to the two frames in
terms of ordinary matrix multiplication. To wit, we can write
y1̄ M1̄1 . . . M1̄n x1
.. .. . . . .
. = . . .. ..
yn̄ Mn̄1 . . . Mn̄n xn
Notes. The term frame is often used to refer to objects that should properly be called a
moving frame. The latter can be thought of as a field of frames, or functions taking values
in the space of all frames, and are fully described elsewhere. The confusion in terminology
is unfortunate but quite common, and is related to the questionable practice of using the
word scalar when referring to a scalar field (a.k.a. scalar-valued functions) and using the
word vector when referring to a vector field.
967
We also mention that in the world of theoretical physics, the preferred terminology seems to
be polyad and related specializations, rather than frame. Most commonly used are dyad, for
a frame of two elements, and tetrad for a frame of four elements.
If ~v1 , . . . , ~vn is a collection of vectors in a vector space V , then a linear combination of the
~vi ∈ V is a vector ~u of the form
Xn
~u = ai~vi
i=1
~ = a1~v1 + a2~v2 , where (a1 , a2 ) ∈ R, then
for any ai which are elements of a field F . Thus if w
w
~ is a linear combination of v~1 and ~v2 .
Let V be a vector space over a field F . Then for scalars λ1 , λ2 , . . . , λn ∈ F the vectors
~v1 , ~v2 , . . . , ~vn ∈ V are linearly independent if the following condtion holds:
λ1~v1 + λ2~v2 + · · · + λn~vn = 0 implies λ1 = λ2 = . . . = λn = 0
Otherwise, if this conditions fails, the vectors are said to be linearly dependent. Furthermore,
an infinite set of vectors is linearly independent if all finite subsets are linearly independent.
In the case of two vectors, linear independence means that one of these vectors is not a scalar
multiple of the other.
Let K be a field and n a positive natural number. We define Kn to be the set of all mappings
from the index list (1, 2, . . . , n) to K. Such a mapping a ∈ Kn is just a formal way of speaking
of a list of field elements a1 , . . . , an ∈ K.
968
The above description is somewhat restrictive. A more flexible definition of a list vector is
the following. Let I be a finite list of indices 4 , I = (1, . . . , n) is one such possibility, and let
KI denote the set of all mappings from I to K. A list vector, an element of KI , is just such
a mapping. Conventionally, superscripts are used to denote the values of a list vector, i.e.
for u ∈ KI and i ∈ I, we write ui instead of u(i).
We add and scale list vectors point-wise, i.e. for u, ∈KI and k ∈ K, we define u + ∈KI and
ku ∈ KI , respectively by
(u + v)i = ui + vi , i ∈ I,
(ku)i = kui , i ∈ I.
0i = 0, i ∈ I.
The above operations give KI the structure of an (abstract) vector space over K.
219.11 nullity
The nullity of a linear mapping is the dimension of the mapping’s kernel. For a linear
mapping T : V → W , the nullity of T gives the number of linearly independent solutions to
the equation
T (v) = 0, v ∈ V.
The nullity is zero if and only if the linear mapping in question is injective.
969
219.12 orthonormal basis
orthonormal basis
Let X be an inner product space over a field F and {xα }α∈J ⊂ X be a set of orthonormal
vectors in the space. If we can write any vector in our space as the sum of vectors from
P the set
multiplied by elements of the field, or in symbols ∀x ∈ X : ∃{aα }α∈J ⊂ F : x = α∈J aα xα
then we say that {xα } form an orthonormal basis for X.
Definition. Let L be a collection of labels α ∈ L. For each ordered pair of labels (α, β) ∈
L×L let Mβα be a non-singular n×n matrix, the collection of all such satisfying the following
functor-like consistency conditions:
We then impose an equivalence relation by stipulating that for all α, β ∈ L and u ∈ Rn , the
pair (α, u) is equivalent to the pair (β, Mβα u). Finally, we define a physical vector to be an
equivalence class of such pairs relative to the just-defined relation.
The idea behind this definition is that the α ∈ L are labels of various coordinate systems,
and that the matrices Mβα encode the corresponding changes of coordinates. For label α ∈ L
and list-vector u ∈ Rn we think of the pair (α, u) as the representation of a physical vector
relative to the coordinate system α.
Discussion. All scientific disciplines have a need for formalization. However, the extent to
which rigour is pursued varies from one discipline to the next. Physicists and engineers are
more likely to regard mathematics as a tool for modeling and prediction. As such they are
likely to blur the distinction between list vectors and physical vectors. Consider, for example
the following excerpt from R. Feynman’s “Lectures on physics” [1]
All quantities that have a direction, like a step in space, are called vectors. A
vector is three numbers. In order to represent a step in space, . . ., we really need
970
three numbers, but we are going to invent a single mathematical symbol, r, which
is unlike any other mathematical symbols we have so far used. It is not a single
number, it represents three numbers: x, y, and z. It means three numbers, but
not only those three numbers, because if we were to use a different coordinate
system, the three numbers would be changed to x0 , y 0 , and z 0 . However, we
want to keep our mathematics simple and so we are going to use the same mark
to represent the three numbers (x, y, z) and the three numbers (x0 , y 0, z 0 ). That
is, we use the same mark to represent the first set of three numbers for one
coordinate system, but the second set of three numbers if we are using the other
coordinate system. This has the advantage that when we change the coordinate
system, we do not have to change the letters of our equations.
Surely you are joking Mr. Feynman!? What are we supposed to make of this definition? We
learn that a vector is both a physical quantity, and a list of numbers. However we also learn
that it is not really a specific list of numbers, but rather any of a number of possible lists.
Furthermore, the choice of which list is being used is dependent on the context (choice of
coordinate system), but this is not really important because we just end up using the same
symbol r regardless.
What a muddle! Even at the informal level one can do better than Feynman. The central
weakness of his definition is that he is unwilling to distinguish between physical vectors
(quantities) and their representation (lists of numbers). Here is an alternative physical
definition from a book by R. Aris on fluid mechanics [2].
There are many physical quantities with which only a single magnitude can be
associated. For example, when suitable units of mass and length have been
adopted the density of a fluid may be measured. . . . There are other quantities
associated with a point that have not only a magnitude but also a direction. If
a force of 1 lb weight is said to act at a certain point, we can still ask in what
direction the force acts and it is not fully specified until this direction is given.
Such a physical quantity is a vector. . . . We distinguish therefore between the
vector as an entity and its components which allow us to reconstruct it in a
particular system of reference. The set of components is meaningless unless the
system of reference is also prescribed, just as the magnitude 62.427 is meaningless
as a density until the units are also prescribed. . . ..
Definition. A Cartesian vector, a, in three dimensions is a quantity with three
components a1 , a2 , a3 in the frame of reference O123, which, under rotation of
the coordinate frame to O 1̄2̄3̄, become components ā1 , ā2 , ā3 , where
āj = lij ai .
971
Here we see a carefully drawn distinction between physical quantities and the numerical
measurements that represent them. A system of measurement, i.e. a choice of units and
or coordinate axes, turns physical quantities into numbers. However the correspondence is
not fixed, but varies according to the choice of measurement system. This point of view can
be formalized by representing physical vectors as labeled list vectors, the label specifying a
choice of measurement systems. The actual vector is then defined to be an equivalence class
of such labeled list vectors.
REFERENCES
1. R. Feynman, R. Leighton, and M. Sands, “Lectures on Physics”, 11-4, Vol. I, Addison-Wesley.
2. R. Aris, “Vectors, Tensors and the Basic Equations of Fluid Mechanics”, Dover.
wi = T (vi ), i = 1...n
Choose a basis u1 , . . . , uk of Ker T . The result will follow once we show that u1 , . . . , uk , v1 , . . . , vn
is a basis of V .
a1 u1 + . . . + ak uk + b1 v1 + . . . + bn vn = 0.
972
By applying T to both sides of this equation it follows that
b1 w1 + . . . + bn wn = 0,
b1 = b2 = . . . = bn = 0.
Consequently
a1 u1 + . . . + ak uk = 0
as well, and since u1 , . . . , uk are also assumed to be linearly independent we conclude that
a1 = a2 = . . . = ak = 0
219.15 rank
The rank of a linear mapping is the dimension of the mapping’s image. For a linear mapping
T : V → W , the rank of T gives the number of independent linear constraints on v ∈ V
imposed by the equation
T (v) = 0.
The rank of a linear mapping is equal to the dimension of the codomain if and only if the
mapping in question is surjective.
The rank of a linear mapping is equal to the dimension of the domain if and only if the
mapping in question is injective.
The sum of the rank and the nullity of a linear mapping gives the dimension of the mapping’s
domain. More precisely, let T : V → W be a linear mapping. If V is a finite-dimensional,
then
dim V = dim Ker T + dim Img T.
973
Every independent linear constraint takes away one degree of freedom.
The rank is just the number of independent linear constraints on v ∈ V imposed by the
equation
T (v) = 0.
The dimension of V is the number of unconstrained degrees of freedom. The nullity is the
degrees of freedom in the resulting space of solutions. To put it yet another way:
Because given S we can define R = S−1 and have A = RBR−1, whether the inverse comes
first or last does not matter.
Transformations of the form S−1 BS (or SBS−1 ) are called similarity transformations.
Discussion Similarity is useful for turning recalcitrant matrices into pliant ones. The
canonical example is that a diagonalizable matrix A is similar to the diagonal matrix of its
eigenvalues Λ, with the matrix of its eigenvectors acting as the similarity transformation.
That is,
A = TΛT−1 (219.17.2)
λ 0 ...
1 −1
= v1 v2 . . . vn 0 λ2 . . . v1 v2 . . . vn (219.17.3)
.. ..
. . λn
This follows directly from the equation defining eigenvalues and eigenvectors,
AT = TΛ. (219.17.4)
974
Through this transformation, we’ve turned A into the product of two orthogonal matrices
and a diagonal matrix. This can be very useful. As an application, see the solution for the
normalizing constant of a multidimensional Gaussian integral.
1. Similarity is reflexive. All square matrices A are similar to themselves via the similarity
transformation A = I−1 AI, where I is the identity matrix.
5. If A is similar to B, then their determinants are equal, |A| = |B|. This is easily
verified:
6. Similar matrices represent the same linear transformation after a change of basis.
219.18 span
The span of a set of vectors ~v1 , . . . , v~n is the set of linear combinations a1~v1 + · · · + an~vn . It’s
denoted Sp(~v1 , . . . , ~vn ). The standard basis vectors î and ĵ span R2 because every vector of
R2 can be represented as a linear combination of î and ĵ. Sp(~v1 , . . . , ~vn ) is a subspace of Rn
and is the smallest subspace containing ~v1 , . . . , ~vn .
Span is both a noun and a verb; a set of vectors can span a vector space, and a vector can
be in the span of a set of vectors.
Checking span: To see whether a vector is in the span of other vectors, one can set up
an augmented matrix, since if ~u is in the span of ~v1 , ~v2 , then ~u = x1~v1 + x2~v2 . This is a
system of linear equations. Thus, if it has a solution, ~u is in the span of ~v1 , ~v2 . Note that
the solution does not have to be unique for ~u to be in the span.
975
To see whether a set of vectors spans a vector space, you need to check that there are at
least as many linearly independent vectors as the dimension of the space. For example, it
can be shown that in Rn , n + 1 vectors are never linearly independent, and n − 1 vectors
never span.
Theorem Let S and T be subspaces of a finite dimensional vector space V . Then T V is the
direct sum of S and T , i.e., V = S ⊕T if and only if dim V = dim S +dim T and S T = {0}.
T
Proof. Suppose that V = S ⊕ T . Then, by definition, V = S + T and S T = {0}. The
dimension theorem for subspaces states that
\
dim(S + T ) + dim S T = dim S + dim T.
Since the dimension of the zero vector space {0} is zero, we have that
dim V = dim S + dim T,
and the first direction of the claim follows.
T
For the other direction, suppose dim V = dim S + dim T and S T = {0}. Then the
dimension theorem theorem for subspaces imply that
dim(S + T ) = dim V.
Now S + T is a subspace of V with the same dimension as V so, by Theorem 1 on this page,
V = S + T . This proves the second direction. 2
219.20 vector
Overview. The word vector has several distinct, but interrelated meanings. The present
entry is an overview and discussion of these concepts, with links at the end to more detailed
definitions.
• A list vector (follow the link to the formal definition) is a finite list of numbers 5 .
Most commonly, the vector is composed of real numbers, in which case a list vector is
5
infinite vectors arise in areas such as functional analysis and quantum mechanics, but require a much
more complicated and sophisticated theory.
976
just an element of Rn . Complex numbers are also quite common, and then we speak
of a complex vector, an element of Cn . Lists of ones and zeroes are also utilized, and
are referred to as binary vectors. More generally, one can use any field K, in which
case a list vector is just an element of Kn .
• A physical vector (follow the link to a formal definition and in-depth discussion)
is a geometric quantity that correspond to a linear displacement. Indeed, it is cus-
tomary to depict a physical vector as an arrow. By choosing a system of coordinates
a physical vector v, can be represented by a list vector (v 1 , . . . , v n )T . Physically, no
single system of measurement cannot be preferred to any other, and therefore such a
representation is not canonical. A linear change of coordinates induces a corresponding
linear transformation of the representing list vector.
In most physical applications vectors have a magnitude as well as a direction, and then
we speak of a euclidean vector. When lengths and angles can be measured, it is most
convenient to utilize an orthogonal system of coordinates. In this case, the magnitude
of a Euclidean vector v is given by the usual Euclidean norm of the corresponding list
vector, qP
kvk = i 2
i (v ) .
977
The difficulty is one of semantics. The term vector is ambiguous, but its various meanings
are interrelated. The different usages of vector call for different formal definitions, which
are similarly interrelated. List vectors are the most elementary and familiar kind of vectors.
They are easy to define, and are mathematically precise. However, saying that a vector is
just a list of numbers leads to conceptual difficulties.
A physicist needs to be able to say that velocities, forces, fluxes are vectors. A geometer,
and for that matter a pilot, will think of a vector as a kind of spatial displacement. Ev-
eryone would agree that a choice of a vector involves multiple degrees of freedom, and that
vectors can linearly superimposed. This description of “vector” evokes useful and intuitive
understanding, but is difficult to formalize.
The synthesis of these conflicting viewpoints is the modern mathematical notion of a vector
space. The key innovation of modern, formal mathematics is the pursuit of generality by
means of abstraction. To that end, we do not give an answer to “What is a vector?”,
but rather give a list of properties enjoyed by all objects that one may reasonably term a
“vector”. These properties are just the axioms of an abstract vector space, or as Forrest
Gump[1] might have put it, “A vector is as a vector does.”
The axiomatic approach afforded by vector space theory gives us maximum flexibility. We
can carry out an analysis of various physical vector spaces by employing propositions based
on vector space axioms, or we can choose a basis and perform the same analysis using list
vectors. This flexibility is obtained by means of abstraction. We are not obliged to say what
a vector is; all we have to do is say that these abstract vectors enjoy certain properties,
and make the appropriate deductions. This is similar to the idea of an abstract class in
object-oriented programming.
Surprisingly, the idea that a vector is an element of an abstract vector space has not made
great inroads in the physical sciences and engineering. The stumbling block seems to be a
poor understanding of formal, deductive mathematics and the unstated, but implicit attitude
that
Great historical irony is at work here. The classical, Greek approach to geometry was purely
synthetic, based on idealized notions like point and line, and on various axioms. Analytic
geometry, a la Descartes, arose much later, but became the dominant mode of thought
in scientific applications and largely overshadowed the synthetic method. The pendulum
began to swing back at the end of the nineteenth century as mathematics became more
formal and important new axiomatic systems, such as vector spaces, fields, and topology,
were developed. The cost of increased abstraction in modern mathematics was more than
justified by the improvement in clarity and organization of mathematical knowledge.
Alas, to a large extent physical science and engineering continue to dwell in the 19th century.
The axioms and the formal theory of vector spaces allow one to manipulate formal geomet-
978
ric entities, such as physical vectors, without first turning everything into numbers. The
increased level of abstraction, however, poses a formidable obstacle toward the acceptance
of this approach. Indeed, mainstream physicists and engineers do not seem in any great
hurry to accept the definition of vector as something that dwells in a vector space. Until this
attitude changes, vector will retain the ambiguous meaning of being both a list numbers,
and a physical quantity that transforms with respect to matrix multiplication.
REFERENCES
1. R. Zemeckis, “Forrest Gump”, Paramount Pictures.
979
Chapter 220
220.1 admissibility
Let k be a field, V a vector space over k, and T a linear operator over V . We say that a
subspace W of V is T -admissible if
1. W is a T - invariant subspace;
980
220.3 cyclic decomposition theorem
Let k be a field, V a finite dimensional vector space over k and T a linear operator over V .
Let W0 be a proper T - admissible subspace of V . There are non zero vectors x1 , ..., xr in V
with respective annihilator polynomials p1 , ..., pr such that
L L L
1. V = W0 Z(x1 , T ) ... Z(xr , T ) (See the cyclic subspace definition)
Moreover, the integer r and the minimals p1 , ..., pr are univocally determined by (1),(2) and
the fact that none of xk is zero.
Let us next define the mapping T : V → W ∗ , a 7→ ω(a, ·). Applying the rank-nullity theorem
981
to T yields
Now ker T = W ω and im T = W ∗ . To see the latter assertion, first note that from the
definition of T , we have im T ⊂ W ∗ . Since S is a linear isomorphism, we also have im T ⊃
W ∗ . Then, since dim W = dim W ∗ , the result follows from equation 220.5.1. 2
α → α ◦ T, α ∈ V ∗.
We can also characterize T ∗ as the adjoint of T relative to the natural evaluation bracket
between linear forms and vectors:
If U and V are finite dimensional, we can also characterize the dualizing operation as the
composition of the following canonical isomorphisms:
' ' '
Hom(U, V ) −→ U ∨ ⊗V −→ (V ∗ )∗ ⊗ U ∗ −→ Hom(V ∗ , U ∗ ).
982
Relation to the matrix transpose. The above properties closely mirror the algebraic
properties of the matrix transpose operation. Indeed, T ∗ is sometimes referred to as the
transpose of T , because at the level of matrices the dual homomorphism is calculated by
taking the transpose.
To be more precise, suppose that U and V are finite-dimensional, and let M ∈ Matn,m (K)
be the matrix of T relative to some fixed bases of U and V . Then, the dual homomorphism
T ∨ is represented as the transposed matrix M T ∈ Matm,n (K) relative to the corresponding
dual bases of U∨, V ∨.
Let Pn denote the vector space of real polynomials of degree n or less, and let Dn : Pn → Pn−1
denote the ordinary derivative. Linear forms on Pn can be given in terms of evaluations, and
(n)
so we introduce the following notation. For every scalar k ∈ R, let Evk ∈ (Pn )∨ denote the
evaluation functional
(n)
Evk : p 7→ p(k), p ∈ Pn .
Note: the degree superscript matters! For example:
(1) (1) (1)
Ev2 = 2 Ev1 − Ev0 ,
(2) (2) (2)
whereas Ev0 , Ev1 , Ev2 are linearly independent. Let us consider the dual homomorphism
D2 ∨, i.e. the adjoint of D2 . We have the following relations:
(1) (2) (2) (2)
D2 ∨ Ev0 = − 23 Ev0 +2 Ev1 − 21 Ev2 ,
(1) (2) (2)
D2 ∨ Ev1 = − 21 Ev0 + 21 Ev2 .
(1) (1) (2) (2) (2)
In other words, taking Ev0 , Ev1 as the basis of (P1 )∨ and Ev0 , Ev1 , Ev2 as the basis
of (P2 )∨, the matrix that represents D2 ∨ is just
3
− 2 − 21
2 0
1 1
−2 2
Note the contravariant relationship between D2 and D2 ∨. The former turns second degree
polynomials into first degree polynomials, where as the latter turns first degree evaluations
into second degree evaluations. The matrix of D2 ∨ has 2 columns and 3 rows precisely
because D2 ∨ is a homomorphism from a 2-dimensional vector space to a 3-dimensional
vector space.
983
and the dual basis of P2 is
1 1
(x − 1)(x − 2), x(2 − x), x(x − 1).
2 2
Relative to these bases, D2 is represented by the transpose of the matrix for D2 ∨, namely
3
− 2 2 − 21
− 12 0 21
This corresponds to the following three relations:
D2 21 (x − 1)(x − 2) = − 23 (−x + 1) − 21 x
D2 [x(2 − x)] = 2 (−x + 1) +0 x
1
D2 2 x(x − 1) = − 21 (−x + 1) + 12 x
Properties
If V and W are finite dimensional, the linear transformation T is invertible if and only if the
matrix of T is not singular.
984
220.10 kernel of a linear transformation
ker T = { x ∈ V | T (x) = 0 }.
The kernel is a vector subspace of V , and its dimension is called the nullity of T . Note that
T is injective if and only if ker T = {0}.
When the transformations are given by means of matrices, the kernel of the matrix A is
ker A = { x ∈ V | Ax = 0 }.
Let V and W be vector spaces over the same field F . A linear transformation is a function
T : V → W such that:
properties:
• T (0) = 0.
985
• If v ∈ V then T −1 T (v) = v + u where u is any element of ker(T ).
See also:
We say that the minimal polynomial, MT (X), is the unique monic polynomial of minimal degree
such that MT (T ) = 0. In other words the matrix representing MT (T ) is the zero matrix.
Proof:
By the division algorithm for polynomials, P (X) = Q(X)MT (X) + R(X) with degR <
degMT . We note that R(X) is also a zero polynomial for T and by minimality of MT (X),
must be just 0. Thus we have shown MT (X) | P (X).
Now suppose P (X) was also both minimal and monic, then we have MT (X) = P (X), which
gives uniqueness.
3. If the minimal polynomial of T splits into linear factors each of degree 1 if and only if
T is diagonal with respect to some basis
986
It is then a simple corollary of the fundamental theorem of algebra that every endomorphism
of a finite dimensional vector space over C may be upper-triangularized.
The minimal polynomial is intimately related to the characteristic polynomial for T . For let
χT (X) be the characteristc polynomial. Then as was shown earlier MT (X) | χT (X). It is
also a fact that the eigenvalues of T are exactly the roots of χT . So when split into linear
factors the only difference between MT (X) and χT (X) is the algebraic multiplicity of the
roots.
Definition [1, 2] Let (V, ω) be a symplectic vector space and let W be a vector subspace of
V . Then the symplectic complement of W is
It is easy to see that W ω is also a vector subspace of V . Depending on the relation between
W and W ω , W is given different names.
Theorem [1, 2] Let (V, ω) be a symplectic vector space, and let W be a vector subspace of
V . Then
dim V = dim W ω + dim W.
REFERENCES
1. D. McDuff, D. Salamon, Introduction to Symplectic Topology, Clarendon Press, 1997.
2. R. Abraham, J.E. Marsden, Foundations of Mechanics, 2nd ed., Perseus Books, 1978.
987
220.14 trace
The trace Tr(A) of a square matrix A is defined to be the sum of the diagonal entries of A.
• Tr(AB) = Tr(BA)
The trace Tr(T ) of a linear transformation T : V −→ V from any finite dimensional vector space
V to itself is defined to be the trace of any matrix representation of T with respect to a basis
of V . This scalar is independent of the choice of basis of V , and in fact is equal to the sum
of the eigenvalues of T (over a splitting field of the characteristic polynomial), including
multiplicities.
The following link presents some examples for calculating the trace of a matrix.
988
Chapter 221
989
The method consists of combining the coefficient matrix A with the right hand side b to get
the “augmented” n × (m + 1) matrix
a11 a12 · · · a1m b1
a21 a22 · · · a2m b2
A b = .. .. .. .. ..
. . . . .
an1 an2 · · · anm bn
Note that these operations are “legal” because x is a solution of the transformed system if
and only if it is a solution of the initial system.
If the number of equations equals the number of variables (m = n), and if the coefficient
matrix A is non-singular, then the algorithm will terminate when the augmented matrix has
the following form:
a011 a012 · · · a01n b1
0 a0 · · · a0 0
22 2n b2
.. .. . . .. ..
. . . . .
0 0
0 · · · ann b0n
With these assumptions, there exists a unique solution, which can be obtained from the
above matrix by back substitution.
For the general case, the termination procedure is somewhat more complicated. First recall
that a matrix is in echelon form if each row has more leading zeros than the rows above it.
A pivot is the leading non-zero entry of some row. We then have
• If there is a pivot in the last column, the system is inconsistent ; there will be no
solutions.
• If that is not the case, then the general solution will have d degrees of freedom, where
d is the number of columns from 1 to m that have no pivot. To be more precise,
the general solution will have the form of one particular solution plus an arbitrary
linear combination of d linearly independent elements of K n .
990
In even more prosaic language, the variables in the non-pivot columns are to be consid-
ered “free variables” and should be “moved” to the right-hand side of the equation. The
general solution is then obtained by arbitrarily choosing values of the free variables, and
then solving for the remaining “non-free” variables that reside in the pivot columns.
In essence, Gauss-Jordan elimination performs the back substitution; the values of the un-
knowns can be read off directly from the terminal augmented matrix. Not surprisingly,
Gauss-Jordan elimination is slower than Gaussian elimination. It is useful, however, for
solving systems on paper.
L(u) = v
can be solved by applying the Gaussian elimination algorithm (a.k.a row reduction). To do
so, we need to have bases u1 , . . . , un and v1 , . . . , vm of U and V , respectively, so that will can
somehow determine a matrix, and a column vector
M11 M12 . . . M1n b1
M2 M2 . . . M2 b2
1 2 n
M = .. .. . . .. , b = ..
. . . . .
m m m
M1 M2 . . . Mn bm
991
We are then able to re-express our linear equation as the following problem: find all n-tuples
of scalars x1 , . . . , xn such that
Mx = b,
Note that the dimension of the domain is the number of variables, while the dimension
of the codomain is the number of equations. The equation is called under-determined or
over-determined depending on whether the former is greater than the latter, or vice versa.
In general, over-determined systems are inconsistent, while under-determined ones have mul-
tiple solutions. However, this is a “rule of thumb” only, and exceptions are not hard to find.
A full understanding of consistency, and multiple solutions relies on the notions of kernel,
image, rank, and is described by the rank-nullity theorem.
Notes. Elementary applications focus exclusively on the coefficient matrix and the right-
hand vector, and neglect to mention the underlying linear mapping. This is unfortunate,
because the concept of a linear equation is much more general than the traditional notion
of “variables and equations”, and relies in an essential way on the idea of a linear mapping.
The attached example regarding polynomial interpolation is a case in point. Polynomial
interpolation is a linear problem, but one that is specified abstractly, rather than in terms
of variables and equations.
992
221.4 linear problem
L(u) = v,
imposed upon elements of the domain u ∈ U. The solution of a linear equation is the set of
u ∈ U that satisfy the above constraint, i.e. the pre-image L−1 (v). The equation is called
inconsistent if no solutions exist, i.e. if the pre-image is the empty set. It is otherwise called
consistent.
u = up + uh , up , uh ∈ U,
where
L(up ) = v
is a particular solution and where
L(uh ) = 0
is any solution of the corresponding homogeneous problem, i.e. an element of the kernel of
L.
Notes. Elementary treatments of linear algebra focus almost exclusively on finite-dimensional linear proble
They neglect to mention the underlying mapping, preferring to focus instead on “variables
and equations.” However, the scope of the general concept is considerably wider, e.g. linear
differential equations such as
y 00 + y = 0.
For a matrix to be in reduced row echelon form it has to first satisfy the requirements to be
in row echelon form and additionally satisfy the following requirements,
2. The first element of value 1 in any row must the only non-zero value in it’s column.
993
An example of a matrix in reduced row echelon form could be,
0 1 2 6 0 1 0 0 4 0
0 0 0 0 1 1 0 0 1 1
0 0 0 0 0 0 1 0 4 1
0 0 0 0 0 0 0 1 2 1
0 0 0 0 0 0 0 0 0 0
A matrix is said to be in row echelon form when each row in the matrix starts with more
zeros then the previous row. Rows which are composed completely of zeros will be grouped
at the bottom of the matrix.
0 2 1 5 0 1 3 2
0 0 1 , 0 0 4 1 0
0 0 0 0 0 0 0 7
If several rows have the same number of leading zeros then the matrix is not in row echelon
form unless the rows contain no non-zero values.
such that
p(x1 ) = y1 , p(x2 ) = y2 .
This is a linear problem. Let P3 denote the vector space of cubic polynomials. The under-
lying linear mapping is the multi-evaluation mapping
E : P 3 → R2 ,
994
given by
p(x1 )
p 7→ , p ∈ P3
p(x2 )
The interpolation problem in question is represented by the equation
y
E(p) = 1
y2
where p ∈ P3 is the unknown. One can recast the problem into the traditional form by
taking standard bases of P3 and R2 and then seeking all possible a, b, c, d ∈ R such that
a
(x1 )3 (x1 )2 x1 1
= y1
b
(x2 ) (x2 ) x2 1 c
3 2
y2
d
However, it is best to treat this problem at an abstract level, rather than mucking about with
row reduction. The Lagrange Interpolation formula gives us a particular solution, namely
the linear polynomial
x − x1 x − x2
p0 (x) = y1 + y2 , x∈R
x2 − x1 x1 − x2
The general solution of our interpolation problem is therefore given as p0 + q, where q ∈ P3
is a solution of the homogeneous problem
E(q) = 0.
995
Chapter 222
AA∗ = det(A)I
222.1.1 Property
996
be the characteristic polynomial of A, where p1 (A), p2 (A), . . . pn (A) = det(A) are the funda-
mental invariant polynomials of A 2 .
so we have
A∗ = pn−1 (A)I − pn−2 (A)A + pn−3 (A)A2 − · · · + (−1)n−2 p1 (A)An−2 + (−1)n−1 An−1
222.2.1 Basics
AA−1 = A−1 A = In
It should be stressed that only square matrices have inverses proper– however, a matrix of
any dimension may have “left” and “right” inverses (which will not be discussed here).
A precondition for the existence of the matrix inverse A−1 is that det A 6= 0 (the determinant
is nonzero), the reason for which we will see in a second.
1
A−1 = (A∗ )T
det(A)
997
This general form also explains why the determinant must be nonzero for invertability; as
we are dividing through by its value.
Method 1:
An easy way to calculate the inverse of a matrix by hand is to form an augmented matrix
[A|I] from A and In , then use Gaussian elimination to transform the left half into I. At the
end of this procedure, the right half of the augmented matrix will be A−1 (that is, you will
be left with [I|A−1 ]).
Method 2:
One can calculate the i, jth element of the inverse by using the formula
A−1
ji = cof ij (A)/ det A
Note that the indices on the left-hand side are swapped relative to the right-hand side. This
has the effect of the transpose which appears in the general form of the inverse in the previous
section.
2-by-2 case:
For the 2×2 case, the general formula reduces to a memorable shortcut. For the 2×2 matrix
a b
M=
c d
−1 1 d −b
M =
det M −c a
Remarks
Some caveats: computing the matrix inverse for ill-conditioned matrices is error-prone; spe-
cial care must be taken and there are sometimes special algorithms to calculate the inverse
of certain classes of matrices (for example, Hilbert matrices).
998
222.2.3 Avoiding the Inverse and Numerical Calculation
The need to find the matrix inverse depends on the situation– whether done by hand or by
computer, and whether the matrix is simply a part of some equation or expression or not.
Instead of computing the matrix A−1 as part of an equation or expression, it is nearly always
better to use a matrix factorization instead. For example, when solving the system Ax = b,
actually calculating A−1 to get x = A−1 b is discouraged. LU-factorization is typically used
instead.
We can even use this fact to speed up our calculation of the inverse by itself. We can cast
the problem as finding X in
AX = B
For n × n matrices A, X, and B (where X = A−1 and B = In ). To solve this, we first find
the LU decomposition of A, then iterate over the columns, solving Ly = P bk and Uxk = y
each time (k = 1 . . . n). The resulting values for xk are then the columns of A−1 .
Typically the matrix elements are members of a field when we are speaking of inverses (i.e.
the reals, the complex numbers). However, the matrix inverse may exist in the case of the
elements being members of a commutative ring, provided that the determinant of the matrix
is a unit in the ring.
222.2.5 References
• Golub and Van Loan, ”Matrix Computations,” Johns Hopkins Univ. Press, 1996.
999
Chapter 223
223.1 singular
An n × n matrix A is called singular if its rows or columns are not linearly independent.
This is equivalent to the following conditions:
More generally, any endomorphism of a finite dimensional vector space with a non-trivial
kernel is a singular transformation.
1000
Chapter 224
Let T be a linear operator on a finite-dimensional vector space V , and let c(t) be the
characteristic polynomial of T . Then c(T ) = T0 , where T0 is the zero transformation. In
other words, T satisfies its own characteristic equation.
Let Ax = b be the matrix form of a system of n linear equations in n unknowns, with x and
b as n × 1 column vectors and A an n × n matrix. If det(A) 6= 0, then this system has a
unique solution, and for each i (1 ≤ i ≤ n) ,
det(Mi )
xi =
det(A)
1001
224.3 cofactor expansion
Let M = (Mij ) be an n × n matrix with entries in some field of scalars. Let mij (M) denote
the (n − 1) × (n − 1) submatrix obtained by deleting row i and column j of M, and set
cof ij (M) = (−1)i+j det(mij (M)).
The matrices mij (M) are called the minors of M, and the scalars cof ij (M) the cofactors.
implies the following representation of the determinant in terms of cofactors. For every
j = 1, 2, . . . , n we have
n
X
det(M) = Mij cof ij (M)
i=1
Xn
det(M) = Mji cof ji(M).
i=1
These identities are called, respectively, the cofactor expansion of the determinant along the
j th column, and the j th row.
The above can equally well be expressed as a cofactor expansion along the first row:
a1 a2 a3
b1 b2 b3 = a1 b2 b3 − a2 b1 b3 + a3 b1 b2
c2 c3 c1 c3 c1 c2
c1 c2 c3
= a1 (b2 c3 − b3 c2 ) − a2 (b1 c2 − b3 c1 ) + a3 (b1 c2 − b2 c1 );
or along the second column:
a1 a2 a3
b1 b3 a1 a3 a1 a3
b1 b2
b3 = −a2
c c + b2 c1 c3 − c2 b1 b3
c1 c2 c3 1 3
1002
224.4 determinant
Overview
The determinant is an algebraic operation that transforms a square matrix into a scalar.
This operation has many useful and important properties. For example, the determinant
is zero if and only if the corresponding system of homogeneous equations is singular. The
determinant also has important geometric applications, because it describes the area of a
parallelogram, and more generally the volume of a parallelepiped.
The notion of determinant predates matrices and linear transformations. Originally, the
determinant was a number associated to a system of n linear equations in n variables. This
number ”determined” whether the system was singular; i.e., possessed multiple solutions.
In this sense, two-by-two determinants were considered by Cardano at the end of the 16th
century and ones of arbitrary size (see the definition below) by Leibniz about 100 years later.
Definition
Let M = (Mij ) be an n × n matrix with entries in some field of scalars 1 . The scalar
M11 M12 . . . M1n
M21 M22 . . . M2n X
.. .. .. .. = sgn(π)M1π1 M2π2 · · · Mnπn (224.4.1)
. . . .
π∈Sn
Mn1 Mn2 . . . Mnn
is called the determinant of M, or det(M) for short. The index π in the above sum varies
over all the permutations of {1, . . . , n} (i.e., the elements of the symmetric group Sn .) Hence,
there are n! terms in the defining sum of the determinant. The symbol sgn(π) denotes the
parity of the permutation; it is ±1 according to whether π is an even or odd permutation.
the overset sign indicates the permutation’s signature. Accordingly, the 3 × 3 deterimant is
a sum of the following 6 terms:
M11 M12 M13
M21 M22 M23 = M11 M22 M33 + M12 M23 M31 + M13 M21 M32
M31 M32 M33 −M11 M23 M32 − M13 M22 M31 − M12 M21 M33
1
Most scientific and geometric applications deal with matrices made up of real or complex numbers.
However, most properties of the determinant remain valid for matrices with entries in a commutative ring.
1003
Remarks and important properties
4. Similar matrices have the same determinant. To be more precise, let A and X be
square matrices with X invertible. Then,
det(XAX −1 ) = det(A).
In particular, if we let X be the matrix representing a change of basis, this shows that
the determinant is independant of the basis. The same is true of the trace of a matrix.
In fact, the whole characteristic polynomial of an endomorphism is definable without
using a basis or a matrix, and it turns out that the determinant and trace are two of
its coefficients.
5. The determinant of a matrix A is zero if and only if A is singular; that is, if there
exists a non-trivial solution to the homogeneous equation
Ax = 0.
det AT = det A.
1004
224.5 determinant as a multilinear mapping
Let M = (Mij ) be an n × n matrix with entries in a field K. The matrix M is really the
same thing as a list of n column vectors of size n. Consequently, the determinant operation
may be regarded as a mapping
n times
z }| {
det : K × . . . × K n → K
n
These three properties uniquely characterize the determinant, and indeed can — some would
say should — be used as the definition of the determinant operation.
1005
The anti-symmetry assumption implies that the expressions det(ei1 , ei2 , . . . , ein ) vanish if
any two of the indices i1 , . . . , in coincide. If all n indices are distinct,
the sign in the above expression being determined by the the number of transpositions
required to rearrange the list (i1 , . . . , in ) into the list (1, . . . , n). The sign is therefore the
parity of the permutation (i1 , . . . , in ). Since we also assume that
det(e1 , . . . , en ) = 1,
Suppose A is n × n square matrix, u, v are two column n-vectors, and α is a scalar. Then
REFERENCES
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So-
ciety, 1994.
3x + 2y + z − 2w = 4
2x − y + 2z − 5w = 15
4x + 2y − 5w = 1
3x − 2z − 4w = 1.
1006
The associated matrix is
3 2 1 −2
2 −1 2 −5
4 2 0 −1
3 0 −2 −4
whose determinant is ∆ = −65. Since the determinant is non-zero, we can use Cramer’s rule.
To obtain the value of the k-th variable, we replace the k-th column of the matrix above by
the column vector
4
15
,
1
1
the determinant of the obtained matrix is divided by ∆ and the resulting value is the wanted
solution.
So
4 2 1 2
15 1 2 5
1 2 0 1
∆1 1 0 2 4 −65
x= = = =1
∆ −65 −65
3 4 1 2
2 15 2 5
4 1 0 1
∆2 3 1 2 4 130
y= = = =2
∆ −65 −65
3 2 4 2
2 1 15 5
4 2 1 1
∆3 3 0 1 4 −195
z= = = =3
∆ −65 −65
3 2 1 4
2 1 2 15
4 2 0 1
∆4 3 0 2 1 65
w= = = = −1
∆ −65 −65
1007
224.8 proof of Cramer’s rule
We claim that this implies that the equation Ax = b has a unique solution. Note that A−1 b
is a solution since A(A−1 b) = (AA−1 )b = b, so we know that a solution exists.
For each integer i, 1 6 i 6 n, let ai denote the ith column of A, let ei denote the ith column
of the identity matrix In , and let Xi denote the matrix obtained from In by replacing column
i with the column vector x.
We know that for any matrices A, B that the kth column of the product AB is simply the
product of A and the kth column of B. Also observe that Aek = ak for k = 1, . . . , n. Thus,
by multiplication, we have:
Let M ∈ matN (K) be a n×n-matrix with entries from a commutative field K. Let e1 , . . . , en
denote the vectors of the canonical basis of K n . For the proof we need the following
lemma: Let Mij∗ be the matrix generated by replacing the i-th row of M by ej . Then
1008
where Mij is the (n − 1) × (n − 1)-matrix obtained from M by removing its i-th row and
j-th column.
Proof: By adding appropriate multiples of the i-th row of Mij∗ to its remaining rows we
obtain a matrix with 1 at position (i, j) and 0 at positions (k, j) (k 6= i). Now we apply the
permutation
(12) ◦ (23) ◦ · · · ◦ ((i − 1)i)
to rows and
(12) ◦ (23) ◦ · · · ◦ ((j − 1)j)
to columns of the matrix. The matrix now looks like this:
Note: I is the identity matrix and s is a complex variable. Also note that RA (s) is undefined
on Sp(A) (the spectrum of A).
1009
Chapter 225
J1 0 ... 0
0 J2 ... 0
...
0 0 ... Jk
λ 1 0 ... 0
0 λ 1 ... 0
0 0 λ ... 0
.. .. .. ..
. . . . 1
0 0 0 ... λ
with a constant value λ along the diagonal and 1’s on the superdiagonal. Some texts place
the 1’s on the subdiagonal instead.
1010
225.2 Lagrange multiplier method
The Lagrange multiplier method is used when one needs to find the extreme values of a
function whose domain is constrained to lie within a particular subset of the domain.
Method
Suppose that f (x) and gi(x), i = 1, ..., m (x ∈ Rn ) are differentiable functions that map
Rn 7→ R, and we want to solve
m
X
∇f = λi ∇gi
i=1
m
X
min f (x) − λi (gi (x))
i=1
Let A be a nonnegative matrix. Denote its spectrum by σ(A). Then the spectral radius
ρ(A) is an eigenvalue, that is, ρ(A) ∈ σ(A), and is associated to a nonnegative eigenvector.
If, in addition, A is an irreducible matrix, then |ρ(A)| > |λ|, for all λ ∈ σ(A), λ 6= ρ(A), and
ρ(A) is a simple eigenvalue associated to a positive eigenvector.
If, in addition, A is a primitive matrix, then ρ(A) > |λ| for all λ ∈ σ(A), λ 6= ρ(A).
1011
225.4 characteristic equation
a11 − λ a · · · a
12 1n
a21 a22 − λ · · · a2n
det(A − λI) = .. .. .. .. = 0
. . . .
an1 an2 · · · ann − λ
This forms an nth-degree polynomial in λ, the solutions to which are called the eigenvalues
of A.
225.5 eigenvalue
If V is finite dimensional, elementary linear algebra shows that there are several equivalent
definitions of an eigenvalue:
But if V is of infinite dimension, (5) has no meaning and the conditions (2), (3), and (4) are
not equivalent to (1). A scalar λ satisfying (2) (called a spectral value of A) need not be
an eigenvalue. Consider for example the complex vector space V of all sequences (xn )∞ n=1 of
complex numbers with the obvious operations, and the map A : V → V given by
A(x1 , x2 , x3 , . . . ) = (0, x1 , x2 , x3 , . . . ) .
1012
Zero is a spectral value of A, but clearly not an eigenvalue.
If k is C or any other algebraically closed field, or if k = R and n is odd, then χ has at least
one zero, meaning that A has at least one eigenvalue. In no case does A have more than n
eigenvalues.
Although we didn’t need to do so here, one can compute the coefficients of χ by introducing a
basis of V and the corresponding matrix for B. Unfortunately, computing n×n determinants
and finding roots of polynomials of degree n are computationally messy procedures for even
moderately large n, so for most practical purposes variations on this naive scheme are needed.
See the eigenvalue problem for more information.
If k = C but the coefficients of χ are real (and in particular if V has a basis for which the
matrix of A has only real entries), then the non-real eigenvalues of A appear in conjugate
pairs. For example, if n = 2 and, for some basis, A has the matrix
0 −1
A=
1 0
then χ(λ) = λ2 + 1, with the two zeros ±i.
The word “eigenvalue” derives from the German “eigenwert”, meaning “proper value”.
225.6 eigenvalue
This can also be expressed as follows: λ is an eigenvalue for T if the kernel of A − λI is non
trivial.
1013
A linear operator can have several eigenvalues (but no more than the dimension of the space).
Eigenvectors corrsponding to different eigenvalues are linearly independent.
1014
Chapter 226
Regardless of which convention is used the minimal polynomial of Cp(x) equals the characteristic polynomial
of Cp(x) and is just p(x).
1015
For property (2), suppose that A−λI = −λA(A−1/λI), where A and λ are as above. Taking
the determinant of both sides, and using part (1), and the properties of the determinant,
yields
1
det(A − λI) = ±λn det(A − I).
λ
Property (2) follows. 2
Theorem 1. Let V be a vector space and let A : V → V be a linear involution. Then the
eigenvalues of A are ±1. Further, if V is Cn , and A is a n × n complex matrix, then we have
that:
1. det A = ±1.
(proof.)
The next theorem gives a correspondence between involution operators and projection op-
erators.
Theorem 2. Let L and P be linear operators on a vector space V , and let I be the identity
operator on V . If L is an involution then the operators 21 I ± L are projection operators.
Conversely, if P is a projection operator, then the operators ±(2P − I) are involutions.
Theorem 3. Let A be a complex n × n matrix. Then any two the the below conditions
imply the third:
1. A is a Hermitian matrix.
2. A is unitary matrix.
1016
The proofs of theorems 2 and 3 are straightforward calculations.
REFERENCES
1. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965
properties:
• Let A be a complex square matrix. Then A is a diagonal matrix if and only if A is a nor-
mal matrix and A is a triangular matrix (see theorem for normal triangular matrices).
examples:
a b
• where a, b ∈ R
−b a
1 i
•
−i 1
see also:
1017
• Wikipedia, normal matrix
226.5 projection
P 2 = P. (226.5.1)
Proposition 10. If P : V → V is a projection, then its image and the kernel are complementary subspaces,
namely
V = ker P ⊕ img P. (226.5.2)
u = v − P v.
The projection condition (226.5.1) then implies that u ∈ ker P , and we can write v as the
sum of an image and kernel vectors:
v = u + P v.
This decomposition is unique, because the intersection of the image and the kernel is the
trivial subspace. Indeed, suppose that v ∈ V is in both the image and the kernel of P . Then,
P v = v and P v = 0, and hence v = 0. QED
V = V1 ⊕ V2
Specializing somewhat, suppose that the ground field is R or C and that V is equipped with
a positive-definite inner product. In this setting we call an endomorphism P : V → V an
orthogonal projection if it is self-dual
P ? = P,
1018
Proposition 11. The kernel and image of an orthogonal projection are orthogonal subspaces.
QED
Let U be a vector space over a field k, whose characteristic is not equal to 2. A mapping
Q : U → k is called a quadratic form if there exists a symmetric bilinear form B : U ×U → k
such that
Q(u) = B(u, u), u ∈ U.
Thus, every symmetric bilinear form determines a quadratic form. The reverse is also true.
Let B be a symmetric bilinear form and Q the corresponding quadratic form. A straightfor-
ward calculation shows that
The above relation is called the polarization identity. It shows that a quadratic form Q fully
determines the corresponding bilinear form B.
Aij = B(ui , uj ), i, j = 1, . . . , n,
u = x1 u1 + . . . + xn un ,
we have n
X
Q(u) = Aij xi xj .
i,j=1
1019
Writing x = (x1 , . . . , xn )T ∈ k n for the corresponding coordinate vector, we can write the
above simply as
Q(u) = xT Ax.
In the case where k is the field of real numbers, we say that a quadratic form is positive definite,
negative definite, or positive semidefinite if the same can be said of the corresponding bilinear
form.
1020
Chapter 227
227.1 QR decomposition
R
A=Q
0
with the n × n upper triangular matrix R. One only has then to solve the triangular system
Rx = P b, where P consists of the first n rows of Q.
The least squares problem Ax ≈ b is easy to solve with A = QR and Q an orthogonal matrix.
The solution
x = (AT A)−1 AT b
becomes
1021
Many different methods exist for the QR decomposition, e.g. the Householder transformation,
the Givens rotation, or the Gram-Schmidt decomposition.
References
1022
Chapter 228
Let R be a ring with 1. Consider the ring Mn×n (R) of n × n-matrices with entries taken
from R.
It will be shown that there exists a one-to-one correspondence between the ideals of R and
the ideals of Mn×n (R).
For 1 ≤ i, j ≤ n, let Eij denote the n × n-matrix having entry 1 at position (i, j) and 0 in
all other places. It can be easily checked that
0 iff k 6= j
Eij · Ekl = (228.1.1)
Eil otherwise.
i = {x ∈ R | xis an entry of A | A ∈ m}
1023
thus, rx, xr ∈ i. This shows that i is an ideal in R. Furthermore, Mn×n (i) ⊆ m.
1024
Chapter 229
A permutation matrix is any matrix which can be created by rearranging the rows and/or
columns of an identity matrix.
1025
Chapter 230
Given an m × n matrix A and an n × 1 real column vector c, both with real coefficients, one
and only one of the following systems has a solution:
Remark. Here, Ax > 0 means that every component of Ax is nonnegative, and similarly
with the other expressions.
1026
Chapter 231
Let A be a square complex matrix. Around every element aii on the diagonal of the matrix,
we
P draw a circle with radius the sum of the norms of the other elements on the same row
j6=i |aij |. Such circles are called Gershgorin discs.
Proof: Let λ be an eigenvalue of A and x its corresponding eigenvector. Choose i such that
|xi | = maxj |xj |. Since x can’t be 0, |xi | > 0. Now Ax = λx, or looking at the i-th component
X
(λ − aii )xi = aij xj .
j6=i
Since the eigenvalues of A and A transpose are the same, you can get an additional
P set of
discs which has the same centers, aii , but a radius calculated by the column j6=i |aji|. In
1027
each of these circles there must be an eigenvalue. Hence, by comparing the row and column
discs, the eigenvalues may be located efficiently
Theorem (Shur’s inequality) Let A be a square n × n matrix with real (or possibly complex
entries). If λ1 , . . . , λn are the eigenvalues of A, and D is the diagonal matrix D = diag(λ1 , . . . , λn ),
then
trace D ∗ D ≤ trace A∗ A.
Here trace is the trace of a matrix, and ∗ is the complex conjugate. Equality holds if and
only if A is a normal matrix ([1], pp. 146).
REFERENCES
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical So-
ciety, 1994.
1028
Chapter 232
Let A be an n × n symmetric real square matrix. If for any non-zero vector x we have
xt Ax < 0,
Let A be an n × n symmetric real square matrix. If for any non-zero vector x we have
xt Ax 6 0,
1029
232.3 positive definite
Introduction
The definiteness of a matrix is an important property that has use in many areas of mathe-
matics and even physics. Below are some examples:
1. In optimizing problems, the definiteness of the Hessian matrix determines the quality
of an extremal value. The full details can be found on this page.
2. In electromagnetism, one can show that the definiteness of a certain media matrix de-
termines the qualitative property of the media: if the matrix is positive or negative definite,
the media is active respectively lossy [1].
Definition [2] Suppose A is an n × n square Hermitian matrix. If, for any non-zero vector
x, we have that
x∗ Ax > 0,
then A a positive definite matrix. (Here x∗ = x T , where x is the complex conjugate of x,
and x T is the transpose of x.)
One can show that a Hermitian matrix is positive definite if and only if all it’s eigenvalues
are positive [2]. Thus the determinant of a positive definite matrix is positive, and a posi-
tive definite matrix is always invertible. The Cholesky decomposition provides an economic
method for solving linear equations involving a positive definite matrix. Further conditions
and properties for positive definite matrices are given in [3].
REFERENCES
1. I.V. Lindell, Methods for Electromagnetic Field Analysis, Clarendon Press, 1992.
2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965
3. C.R. Johnson, Positive definite matrices, American Mathematical Monthly, Vol. 77, Issue
3 (March 1970) 259-264.
Let A be an n × n symmetric real square matrix. If for any non-zero vector x we have
xt Ax > 0,
1030
we call A a positive semidefinite matrix.
A nonnegative square matrix A is said to be a primitive matrix if there exists k such that
Ak 0, i.e., if there exists k such that for all i, j, the (i, j) entry of Ak is akij > 0.
2. for each i, j, there exists k such that the (i, j) entry in Ak is akij > 0.
1031
Chapter 233
An n×n matrix is doubly stochastic if and only if it is a convex combination of permutation matrices.
L et {Ai }m m
Pnm× n doubly-stochastic matrices, and suppose {λi }i=1 is a
i=1 be a collection of
collection
Pm of scalars satisfying i=1 λi = 1 and λi > 0 for each i = 1, . . . , m. We claim that
A = i=1 λi Ai is doubly stochastic.
Take any i ∈ {1, . . . , m}. Since Ai is doubly stochastic, each of its rows and columns sum
to 1. Thus each of the rows and columns of λi Ai sum to λi .
By the definition of elementwise summation, given matrices N = M1 + M2 , the sum of the
entries in the ith column of N is clearly the sum of the sums of entries of the ith columns of
M1 and M2 respectively. A similar result holds for the jth row.
Hence the sum of the entries in the ith column
Pm of A is the sum of the sums of entries of the
ith columns of λk Ak for each i, that is, k=1 λk = 1. The sum of entries of the jth row of
A is the same. Hence A is doubly stochastic.
Observe that since a permutation matrix has a single nonzero entry, equal to 1, in each
row and column, so the sum of entries in any row or column must be 1. So a permutation
1032
matrix is doubly stochastic, and on applying the lemma, we see that a convex combination
of permutation matrices is doubly stochastic.
This provides one direction of our proof; we now prove the more difficult direction: sup-
pose B is doubly-stochastic. Define a weight graph G = (V, E) with vertex set V =
{r1 , . . . , rn , c1 , . . . , cn }, edge set E, where eij = (ri , ci ) ∈ E if Bij 6= 0, and edge weight
ω, where ω(eij ) = Bij .
For any A ⊂ V define N(A), the neighbourhood of A, to be the set of vertices u ∈ V such
that there is some v ∈ A such that (u, v) ∈ E.
P
We claim that, for any v ∈ V , u∈N ({v}) ω(u, v) = 1. Take any v ∈ V ; either v ∈ R or
v ∈ C. Since G is bipartite, v ∈ R implies N({v}) ⊂ C, and v ∈ C implies N({v}) ⊂ R.
Now,
X Xn n
X n
X
v = ri → ω(ri , u) = ω(ri, cj ) = Bij = Bij = 1
u∈N (ri ) j=1, j=1, j=1
eij ∈E Bij 6=0
X X n Xn n
X
v = cj → ω(u, cj ) = ω(ri, cj ) = Bij = Bij = 1
u∈N (cj ) i=1, i=1, i=1
eij ∈E Bij 6=0
So |N(A)| > |A|. We may therefore apply the graph-theoretic version of Hall’s marriage theorem
to G to conclude that G has a perfect matching.
Note that Bij = 0 implies Pij = 0: if Bij = 0, then (ri , cj ) 6∈ E, so (ri , cj ) 6∈ M, which
implies Pij = 0.
1033
ri is an end of e0 . Let the other end be cj for some i; then Pij = 1.
Suppose ii , i2 ∈ {1, . . . , n} with i1 6= i2 and Pi1 ,j = Pi2 ,j = 1 for some j. This implies
(ri1 , cj ), (ri2 , cj ) ∈ M, but this implies the vertex i is the end of two distinct edges, which
contradicts the fact that M is a matching.
Hence, for each row and column of P , there is exactly one nonzero entry, whose value is 1.
So P is a permutation matrix.
Define λ = min {Bij | Pij 6= 0}. We see that λ > 0 since Bij > 0, and Pij 6= 0 → Bij 6= 0.
i,j∈{1,...,n}
Further, λ = Bpq for some p, q.
Note that since every row and column of B and P sums to 1, that every row and column of
1
D = B − λP sums to 1 − λ. Define B 0 = 1−λ D. Then every row and column of B 0 sums to
1, so B 0 is doubly stochastic.
1034
Chapter 234
A = A¯T = A∗
Note that a Hermitian matrix must have real diagonal elements, as the complex conjugate
of these elements must be equal to themselves.
Any real symmetric matrix is Hermitian; the real symmetric matrices are a subset of the
Hermitian matrices.
1 1 + i 1 + 2i 1 + 3i
1−i 2 2 + 2i 2 + 3i
1 − 2i 2 − 2i 3 3 + 3i
1 − 3i 2 − 3i 3 − 3i 4
Hermitian matrices are named after Charles Hermite (1822-1901) [2], who proved in 1855
that the eigenvalues of these matrices are always real [1].
1035
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. The MacTutor History of Mathematics archive, Charles Hermite
In this example, we show that any square matrix with complex entries can uniquely be
decomposed into the sum of one Hermitian matrix and one skew-Hermitian matrix. A fancy
way to say this is that complex square matrices is the direct sum of Hermitian and skew-
Hermitian matrices.
Let us denote the vector space (over C) of complex square n × n matrices by M. Further,
we denote by M+ respectively M− the vector subspaces of Hermitian and skew-Hermitian
matrices. We claim that
M = M+ ⊕ M− . (234.2.1)
Special cases
1036
234.3 identity matrix
The n × n identity matrix I (or In ) over a ring R is the square matrix with coefficients in
R given by
1 0 ··· 0
0 1 ··· 0
I= .
0 0 .. 0
0 0 · · · 1,
where the numeral ”1” and ”0” respectively represent the multiplicative and additive identities
in R. The identity matrix In serves as the identity in the ring of n × n matrices over R. For
any n × n matrix M, we have In M = MIn = M, and the identity matrix is uniquely defined
by this property. In addition, for any n × m matrix A and m × n B, we have IA = A and
BI = B.
Properties
• For the determinant, we have det I = 1, and for the trace, we have tr I = n.
• The identity matrix has only one eigenvalue λ = 1 of multiplicity n. The corresponding
eigenvectors can be chosen to be v1 = (1, 0, . . . , 0), . . . , vn = (0, . . . , 0, 1).
A = −A∗ .
properties.
1037
1. The trace of a skew-Hermitian matrix is purely imaginary or zero.
Proof. For property (1), let xij and yij be the real respectively imaginary parts of the
elements in A. Then the diagonal elements of A are of the form xkk + iykk , and the diagonal
elements in −A∗ are of the form −xkk + iykk . Hence xkk , i.e., the real part for the diagonal
elements in A must vanish, and property (1) follows. For property (2), suppose A is a
skew-Hermitian matrix, and x an eigenvector corresponding to the eigenvalue λ, i.e.,
Ax = λx. (234.4.1)
x∗ Ax = λx∗ x.
Since x is a eigenvector, x is not the zero vector, and the right hand side in the above
equation is a positive real number. For column vectors x, y, and a square matrix A, we have
that (Ax) T = (x T A T ) T and x T y = y T x. Thus
x∗ Ax = x T Ax
= xT (x∗ A T ) T
= (x∗ A∗ )x
= −x∗ Ax,
so the left hand side in equation 234.4.1 is purely imaginary or zero. Hence the eigenvalue
λ corresponding to x is purely imaginary or zero. 2
REFERENCES
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons,
1978.
234.5 transpose
The transpose of a matrix A is the matrix formed by ”flipping” A about the diagonal line
from the upper left corner. It is usually denoted At , although sometimes it is written as AT
or A0 . So if A is an m × n matrix and
1038
a11 a12 ··· a1n
a21 a22 ··· a2n
A = .. .. .. ..
. . . .
am1 am2 · · · amn
then
a11 a21 · · · am1
a12 a22 · · · am2
At = .. .. .. ..
. . . .
a1n a2n · · · anm
properties.
Let A and B be m × n matrices and c be a constant. Let x and y be column vectors with n
rows. Then
1. (At )t = A
2. (A + B)t = At + B t
3. (cA)t = cAt
4. (AB)t = B t At
7. The transpose is a linear mapping from the vector space of matrices to itself. That is,
(αA + βB)t = α(A)t + β(B)t , for same-sized matrices A and B and scalars α and β.
The familiar vector dot product can also be defined using the matrix transpose. If x and y
are column vectors with n rows each,
xt y = x · y
which implies
xt x = x · x = ||x||22
1039
which is another way of defining the square of the vector Euclidian norm.
1040
Chapter 235
Let R be an ordered ring with a valuation | · | and let M(R) denote the set of matrices over
R. The Frobenius norm function or Euclidean matrix norm is the norm function
||A||F : M(R) → R given by
v
uX
u m X n
|| A ||F = t |aij |2
i=1 j=1
1041
235.2 matrix p-norm
k Ax kp
k A kp = sup x ∈ Rn , A ∈ Rm×n .
x6=0 k x kp
An alternate definition is
k A kp = max k Ax kp .
k x kp =1
As with vector p-norms, the most important are the 1, 2, and ∞ norms. The 1 and ∞ norms
are very easy to calculate for an arbitrary matrix:
m
X
k A k1 = max |aij |
16j6n
i=1
m
X
k A k∞ = max |aij |.
16i6n
j=1
The calculation of the 2-norm is more complicated. However, it can be shown that the
2-norm of A is the square root of the largest eigenvalue of AT A. There are also various
inequalities that allow one to make estimates on the value of k A k2:
1 √
√ k A k∞ 6 k A k2 6 mk A k∞.
n
1 √
√ k A k1 6 k A k2 6 nk A k1.
m
√
k A k2 6 k A kF 6 nk A k2 .
1042
235.3 self consistent matrix norm
1043
Chapter 236
Let V be a vector space where an inner product <, > has been defined. Such spaces can be
given also a norm by defining √
kxk = < x, x >.
A very special case is when V = Rn and the inner product is the dot product defined as
< v, w >= v t w and usually denoted as v · w and the resulting norm is the Euclidean norm.
Notice that in this case inequality holds even if the modulus on the middle term (which is a
real number) is not used.
Cauchy-Schwarz inequality is also a special case of Holder inequality. The inequality arises
in lot of fields, so it is known under several othernames as Buniakovsky inequality or Kan-
torovich inequality. Another form that arises often is Cauchy-Schwartz inequality but this
is a mispelling since the inequality is named after Hermann Amandus Schwarz (1843-1921).
1044
236.2 adjoint endomorphism
Definition (the bilinear case). Let U be a finite-dimensional vector space over a field
K, and B : U × U → K a symmetric, non-degenerate bilinear mapping, for example a real
inner product. For an endomorphism T : U → U we define the adjoint of T relative to B to
be the endomorphism T ? : U → U, characterized by
B(u, v) = (Bu)(v), u, v ∈ U.
We then have
T ? = B −1 T ∨ B.
To put it another way, B gives an isomorphism between U and the dual U ∗ , and the adjoint
T ? is the endomorphism of U that corresponds to the dual homomorphism T ∨ : U∨ → U∨.
Here is a commutative diagram to illustrate this idea:
T?
U U
B B
T∗
U∗ U∗
Let P ∈ Matn,n (K) denote the matrix of the inner product relative to the same basis, i.e.
Pij = B(ui , uj ).
Then, the representing matrix of T ? relative to the same basis is given by P −1 M T P. Spe-
cializing further, suppose that the basis in question is orthonormal, i.e. that
B(ui , uj ) = δij .
1045
The Hermitian (sesqui-linear) case. If T : U → U is an endomorphism of a unitary space
(a complex vector space equipped with a Hermitian inner product). In this setting we can
define we define the Hermitian adjoint T ? : U → U by means of the familiar adjointness
condition
hu, T vi = hT ? u, vi, u, v ∈ U.
However, the analogous operation at the matrix level is the conjugate transpose. Thus, if
M ∈ Matn,n (C) is the matrix of T relative to an orthonormal basis, then M T is the matrix
of T ? relative to the same basis.
236.3 anti-symmetric
Antisymmetric is not the same thing as ”not symmetric”, as it is possible to have both at
the same time. However, a relation R that is both antisymmetric and symmetric has the
condition that xRy ⇒ x = y. There are only 2n such possible relations on A.
An example of an antisymmetric relation on A = {◦, ×, ?} would be R = {(?, ?), (×, ◦), (◦, ?), (?, ×)}.
One relation that isn’t antisymmetric is R = {(×, ◦), (?, ◦), (◦, ?)} because we have both ?R◦
and ◦R?, but ◦ =6 ?
1046
Bilinear forms. If U = V then B is a bilinear form. In this case further assumptions
are often made:
Left and Right Maps. We may regard the bilinear map as a left map or right map, as
follows:
BL ∈ L(U, V ∗ ) BR ∈ L(V, U ∗ )
BL (x)(y) = B(x, y)
BR (y)(x) = B(x, y)
The left is a linear map from U into the dual of V . So for example B is skew-symmetric iff
BL ⇔ −BR .
Let B01 and B02 be new bases, and P and Q the corresponding change of basis matrices. Then
the new matrix is C 0 = P T CQ.
Rank. If U and V are finite dimensional it may be shown that rank BL = rank BR . We
call this simply the rank of B. We say that B is non-degenerate if the left and right map
are both linear isomorphisms.
Now applying the rank-nullity theorem on both left and right maps gives the following results:
1047
dim V = dim ker BR + r
T ⊥ = {v | B(t, v) = 0 ∀t ∈ T }
⊥
S = {u | B(u, s) = 0 ∀s ∈ S}
The orthogonal of a subspace is itself a subspace. Further if B is a symmetric or skew-symmetric bilinear form
then ⊥ A = A⊥ , and we may choose the latter notation.
T T
T is a non-degenerate subspace if T T ⊥ = {0}. Similarly when S ⊥ S = {0}.
We may also realise T ⊥ by considering the restriction B 0T= BT ×V . It’s clear that T ⊥ =
ker BR0 . Now if B is non-degenerate (or more generally T ⊥ V = {0}) then ker BL0 = {0},
and we can use the rank-nullity equations to get dim V = dim T + dim T ⊥ . Similarly we may
show that dim U = dim S + dim⊥ S.
a1 0 ... 0
0 a2 ... 0
. .. .. .
.. . . ..
0 0 . . . an
1048
Adjoint. Suppose B : U × V → K is a non-degenerate bilinear map. If T ∈ L(U, U) then
we define the adjoint of T , T ? ∈ L(V, V ) to be the unique linear map such that:
If U = V and we choose a canonical (orthogonal) basis for B, then the adjoint corresponds
to the matrix transpose.
T is then said to be a normal operator (with respect to this bilinear map) if it commutes
with its adjoint.
B :V ×V∗ →K
B(v, f ) = f (v)
Here the orthogonal is exactly the annihilator. This gives the result that dim U + dim U ◦ =
dim V .
If the matrix is the identity, I, then this gives the standard Euclidean inner product on K n .
An inner product space on a vector space is a bilinear form if its field is real, but not
if it is complex. In fact, the bilinear form associated with a real inner product space is
non-degenerate, and every subspace is non-degenerate. So, as we may intuitively expect,
V = U ⊕ U ⊥ for every subspace U.
Let u = (u1 , u2, . . . , un ) and v = (v1 , v2 , . . . , vn ) two vectors on k n where k is a field (like R
or C). Then we define the dot product of the two vectors as:
u · v = u1 v1 + u2 v2 + · · · + un vn .
1049
Notice that u · v is NOT a vector but an scalar (an element from the field k).
If u, v are vectors in Rn and θ is the angle between them, then we also have
u · v = kukkvk cos θ.
Theorem
Let S be a set of vectors from an inner product space L. If S is orthonormal, then S is
linearly independent.
Proof. We denote by h·, ·i the inner product of L. Let us first consider the case when S is
finite, i.e., S = {e1 , . . . , en } for some n. Suppose
λ1 e1 + · · · + λn en = 0
for some scalars λi (belonging to the field on the underlying vector space of L). For a fixed
k in 1, . . . , n, we then have
0 = hek , 0i
= hek , λ1 e1 + · · · + λn en i
= λ1 hek , e1 i + · · · + λn hek , en i
= λk ,
The present result with proof can be found in [1], page 153.
REFERENCES
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons,
1978.
1050
236.7 inner product
An inner product on a vector space V over a field K (which must be either the field R of
real numbers or the field C of complex numbers) is a function ( , ) : V × V −→ K such that,
for all k1 , k2 ∈ K and v1 , v2 , v, w ∈ V , the following properties hold:
(Note: Rule 2 guarantees that (v, v) ∈ R, so the inequality (v, v) > 0 in rule 3 makes sense
even when K = C.)
Every inner product space is a normed vector space, with the norm being defined by ||v|| :=
p
(v, v).
A vector space over R or C taken with a specific inner product (hx, yi) forms an inner product
space.
For example, Rn with the familiar dot product forms an inner product space.
p
The expression hx, xi is written kxk and is called the norm. This makes the inner product
space also a normed vector space. That is, the inner product space also has the following
properties:
1051
where K is the underlying field of the vector space.
236.10 self-dual
Exactly the same definitions can be made for an endomorphism of a complex vector space
with a Hermitian inner product.
1052
Relation to the matrix transpose. All of these definitions have their counterparts in
the matrix setting. Let M ∈ Matn,n (K) be the matrix of T relative to an orthogonal basis
of U. Then T is self-dual if and only if M is a symmetric matrix, and anti self-dual if and
only if M is a skew-symmetric matrix.
In the case of a Hermitian inner product we must replace the transpose with the conjugate transpose.
Thus T is self dual if and only if M is a Hermitian matrix, i.e.
M = M t.
MM ? = M ? M.
2. Letting
Λ = {λ ∈ C | M − λ is singular}
denote the spectrum (set of eigenvalues) of M, the corresponding eigenspaces
1053
give an orthogonal, direct sum decomposition of U, i.e.
M
U= Eλ ,
λ∈Λ
Remarks.
1. Here are some important classes of normal operators, distinguished by the nature of
their eigenvalues.
• Hermitian operators. Eigenvalues are real.
• unitary transformations. Eigenvalues lie on the unit circle, i.e. the set of complex
numbers of modulus 1.
• Orthogonal projections. Eigenvalues are either 0 or 1.
2. There is a well-known version of the spectral theorem for R, namely that a self-adjoint
(symmetric) transformation of a real inner product spaces can diagonalized and that
eigenvectors corresponding to different eigenvalues are orthogonal. An even more down-
to-earth version of this theorem says that a symmetric, real matrix can always be
diagonalized by an orthonormal basis of eigenvectors.
3. There are several versions of increasing sophistication of the spectral theorem that
hold in infinite-dimensional, Hilbert space setting. In such a context one must dis-
tinguish between the so-called discrete and continuous (no corresponding eigenspace)
spectrums, and replace the representing sum for the operator with some kind of an
integral. The definition of self-adjointness is also quite tricky for unbounded opera-
tors. Finally, there are versions of the spectral theorem, of importance in theoretical
quantum mechanics, that can be applied to continuous 1-parameter groups of com-
muting, self-adjoint operators.
1054
Version: 4 Owner: rmilson Author(s): rmilson, NeuRet
1055
Chapter 237
Geometric algebra is a Clifford algebra which has been used with great success in the mod-
eling of a wide variety of physical phenomena. Clifford algebra is considered a more general
algebraic framework than geometric algebra. The primary distinction is that geometric al-
gebra utilizes only real numbers as scalars and to represent magnitudes.
Let Vn be an n–dimensional vector space over the real numbers. The geometric algebra
Gn = G(Vn ) is a graded algebra similar to Grassmann’s exterior algebra, except that the
exterior product is replaced by a more fundamental multiplication operation known as the
geometric product. For vectors a, b, c ∈ Vn and real scalars α, β ∈ R, the geometric
product satisfies the following axioms:
ab = ba
1056
The parallelism of vectors is encoded as a symmetric property, while orthogonality of vectors
is encoded as an antisymmetric property.
The contraction rule specifies that the square of any vector is a scalar equal to the sum of
the square of the magnitudes of its components in each basis direction. Depending on the
contraction rule for each of the basis directions, the magnitude of the vector may be positive,
negative, or zero. A vector with a magnitude of zero is called a null vector.
The graded algebra Gn generated from Vn is defined over a 2n -dimensional linear space.
This basis entities for this space can be generated by successive application of the geometric
product to the basis vectors of Vn until a closed set of basis entities is obtained. The basis
entites for the space are known as blades. The following multiplication table illustrates the
generation of basis blades from the basis vectors e1 , e2 ∈ Vn .
0 e1 e2 e12
e1 1 e12 1 e2
e2 −e12 2 −2 e1
e12 −1 e2 2 e1 −1 2
Here, 1 and 2 represent the contraction rule for e1 and e2 respectively. Note that the basis
vectors of Vn become blades themselves in addition to the multiplicative identity, 0 ⇔ 1
and the new bivector e12 ⇔ e1 e2 . As the table demonstrates, this set of basis blades is
closed under the geometric product.
The geometric product ab is related to the inner product a · b and the exterior product
a ∧ b by
ab = a · b + a ∧ b = b · a − b ∧ a = 2a · b − ba.
1057
Chapter 238
The Einstein summation convention imply that when an index occurs more than once
in the same expression, the expression is implicitly summed over all possible values for that
index. Therefore, in order to use the summation convention, it must be clear from the
context over what range indices should be summed.
n
1. Let {ei }P
i=1 be a orthogonal basis P in Rn . Then the inner product of the vectors u =
n
ui ei = ( i=1 )uiei and v = v i ei = ( ni=1 )v i ei , is
u · v = ui v j ei · ej
= δij uiv j .
2. Let V be a vector space with basis {ei }ni=1 and a dual basis {ei }ni=1 . Then, for a vector
v = v i ei and dual vectors α = αi ei and β = βi ei , we have
(α + β)(v) = αi v i + βj v j
= (αi + βi )v i .
This example shows that the summation convention is “distributive” in a natural way.
3. chain rule. Let F : Rm → Rn , x 7→ (F 1 , · · · , F n ), and G : Rn → Rp , y 7→ (G1 (y), · · · , Gp (y))
be smooth functions. Then
∂(G ◦ F )i ∂Gi ∂F k
(x) = F (x) (x),
∂xj ∂y k ∂xj
where the right hand side is summed over k = 1, . . . , n.
1058
An index which is summed is called a dummy index or dummy variable. For instance, i
is a dummy index in v iei . An expression does not depend on a dummy index, i.e., v i ei = v j ej .
It is common that one most change the name of dummy indices. For instance, above, in
Example 2 when we calculated u · v, it was necessary to change the index i in v = v iei to j
so that it would not clash with u = ui ei .
When using the Einstein summation convention, objects are usually indexed so that when
summing, one index will always be an “upper index” and the other a “lower index”. Then
summing should only take place over upper and lower indices. In the above examples, we
have followed this rule. Therefore we did not write δij uiv j = uiv i in the first example since
ui v i has two upper indices. This is consistent; it is not possible to take the inner product of
two vectors without a metric, which is here δij . The last example illustrates that when we
i
consider k as a “lower index” in ∂G∂y k
, then the chain rule obeys this upper-lower rule for the
indices.
The present entry employs the terminology and notation defined and described in the entry
on tensor arrays. To keep things reasonably self-contained we mention that the symbol Tp,q
refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p × I q → K,
We say that a tensor array is a characteristic array, a.k.a. a basic tensor, if all but one of its
values are 0, and the remaining non-zero value is equal to 1. For tuples A ∈ I p and B ∈ I q ,
we let
εB p q
A : I × I → K,
The type (p, q) characteristic arrays form a natural basis for Tp,q .
Furthermore the outer multiplication of two characteristic arrays gives a characteristic array
of larger valence. In other words, for
A1 ∈ I p1 , B1 ∈ I q1 , A2 ∈ I p2 , B2 ∈ I q2 ,
we have that
εB1 B2 B1 B2
A1 ε A2 = ε A1 A2 ,
1059
where the product on the left-hand side is performed by outer multiplication, and where
A1 A2 on the right-hand side refers to the element of I p1 +p2 obtained by concatenating the
tuples A1 and A2 , and similarly for B1 B2 .
In this way we see that the type (1, 0) characteristic arrays ε(i) , i ∈ I (the natural
basis of
I (i) I ∗
K ), and the type (0, 1) characteristic arrays ε , i ∈ I (the natural basis of K ) generate
the tensor array algebra relative to the outer multiplication operation.
The just-mentioned fact gives us an alternate way of writing and thinking about tensor
arrays. We introduce the basic symbols
By way of illustration, suppose that I = (1, 2). We can now write down a type (1, 0) tensor,
i.e. a column vector 1
u
u= ∈ T1,0
u2
as
u = u1 ε(1) + u2 ε(2) .
Similarly, a row-vector
φ = (φ1 , φ2 ) ∈ T0,1
can be written down as
φ = φ1 ε(1) + φ2 ε(2) .
In the case of a matrix 1
M 1 M 21
M= ∈ T1,1
M 12 M 22
we would write
(1) (2) (1) (2)
M = M 11 ε(1) + M 12 ε(1) + M 21 ε(2) + M 22 ε(2) .
1060
238.3 multi-linear
M : V1 × V2 × . . . × Vn → W
Notes.
Note: the present entry employs the terminology and notation defined and described in the
entry on tensor arrays. To keep things reasonably self contained we mention that the symbol
Tp,q refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p × I q → K,
that combines a type (p1 , q1 ) tensor array X and a type (p2 , q2 ) tensor array Y to produce
a type (p1 + p2 , q1 + q2 ) tensor array XY (also written as X ⊗ Y ), defined by
i1 ...ip ip +1 ...ip +p i ...ip ip +1 ...ip +p
(XY )j1 ...jq11 jq11 +1 ...jq11 +q22 = Xj11...jq11 Yjq11+1 ...jq11+q22
Speaking informally, what is going on above is that we multiply every value of the X array
by every possible value of the Y array, to create a new array, XY . Quite obviously then,
the size of XY is the size of X times the size of Y , and the index slots of the product XY
are just the union of the index slots of X and of Y .
1061
Outer multiplication is a non-commutative, associative operation. The type (0, 0) arrays are
the scalars, i.e. elements of K; they commute with everything. Thus, we can embed K into
the direct sum M
Tp,q ,
p,q∈N
By way of illustration we mention that the outer product of a column vector, i.e. a type
(1, 0) array, and a row vector, i.e. a type (0, 1) array, gives a matrix, i.e. a type (1, 1) tensor
array. For instance:
a ax ay az
b ⊗ x y z = bx by bz , a, b, c, x, y, z ∈ K
c cx cy cz
238.5 tensor
Geometric and physical quantities may be categorized by considering the degrees of freedom
inherent in their description. The scalar quantities are those that can be represented by a
single number — speed, mass, temperature, for example. There are also vector-like quan-
tities, such as force, that require a list of numbers for their description. Finally, quantities
such as quadratic forms naturally require a multiply indexed array for their representation.
These latter quantities can only be conceived of as tensors.
Actually, the tensor notion is quite general, and applies to all of the above examples; scalars
and vectors are special kinds of tensors. The feature that distinguishes a scalar from a vector,
and distinguishes both of those from a more general tensor quantity is the number of indices
in the representing array. This number is called the rank of a tensor. Thus, scalars are rank
zero tensors (no indices at all), and vectors are rank one tensors.
1
We will not pursue this line of thought here, because the topic of algebra structure is best dealt with in
the a more abstract context. The same comment applies to the use of the tensor product sign ⊗ in denoting
outer multiplication. These topics are dealt with in the entry pertaining to abstract tensor algebra.
2
“Ceci n’est pas une pipe,” as Rene Magritte put it. The image and the object represented by the image
and not the same thing. The mass of a stone is not a number. Rather the mass can be described by a
number relative to some specified unit mass.
1062
It is also necessary to distinguish between two types of indices, depending on whether the
corresponding numbers transform covariantly or contravariantly relative to a change in the
frame of reference. Contravariant indices are written as superscripts, while the covariant
indices are written as subscripts. The valence of a tensor is the pair (p, q), where p is the
number contravariant and q the number of covariant indices, respectively.
Finally, it must be mentioned that most physical and geometric applications are concerned
with tensor fields, that is to say tensor valued functions, rather than tensors themselves.
Some care is required, because it is common to see a tensor field called simply a tensor.
i ...i
There is a difference, however; the entries of a tensor array Tj11...jpq are numbers, whereas
the entries of a tensor field are functions. The present entry treats the purely algebraic
aspect of tensors. Tensor field concepts, which typically involved derivatives of some kind,
are discussed elsewhere.
e1 , . . . , en ∈ U.
Every vector in U can be “measured” relative to this basis, meaning that for every v ∈ U
there exist unique scalars v i , such that (note the use of the Einstein summation convention)
v = v i ei .
These scalars are called the components of v relative to the frame in question.
1063
Let e1 , . . . , en ∈ U∗ be the corresponding dual basis, i.e.
ei (ej ) = δji ,
where the latter is the Kronecker delta array. For every covector α ∈ U∗ there exists a
unique array of components αi such that
α = αi ei .
More generally, every tensor T ∈ Up,q has a unique representation in terms of components.
i ...i
That is to say, there exists a unique array of scalars Tj11...jpq such that
i ...i
T = Tj11...jpq ei1 ⊗ . . . ⊗ eiq ⊗ ej1 ⊗ . . . ⊗ ejp .
Transformation rule. Next, suppose that a change is made to a different frame of refer-
ence, say
ê1 , . . . , ên ∈ U.
Any two frames are uniquely related by an invertible transition matrix Aij , having the
property that for all values of j we have
Let v ∈ U be a vector, and let v i and v̂ i denote the corresponding component arrays relative
to the two frames. From
v = v iei = v̂ i êi ,
and from (238.5.1) we infer that
v̂ i = B ij v j , (238.5.2)
where B ij is the matrix inverse of Aij , i.e.
Aik B kj = δ ij .
Thus, the transformation rule for a vector’s components (238.5.2) is contravariant to the
transformation rule for the frame of reference (238.5.1). It is for this reason that the super-
script indices of a vector are called contravariant.
To establish (238.5.2), we note that the transformation rule for the dual basis takes the form
êi = B ij ej ,
and that
v i = ei (v),
while
v̂ i = êi (v).
1064
The transformation rule for covector components is covariant. Let α ∈ U∗ be a given
covector, and let αi and α̂i be the corresponding component arrays. Then
α̂j = Aij αi .
The above relation is easily established. We need only remark that
αi = α(ei ),
and that
α̂j = α(êj ),
and then use (238.5.1).
In light of the above discussion, we see that the transformation rule for a general type (p, q)
tensor takes the form
i ...i i k ...k
T̂ j11 ...jqp = Aik11 · · · A kqq B lj11 · · · B lj1p T l11...lpq .
Let R be a ring, and M an R-module. The tensor algebra T(M) is a graded R-algebra with
n-th graded component Tn (M) simply the nth tensor power M ⊗n of M, and multiplication
ab = a ⊗ b ∈ Tn+m (M) for a ∈ Tn (M), b ∈ Tm (M). Thus
∞
M
T(M) = Tn (M) = R ⊕ M ⊕ M ⊗ M ⊕ · · ·
n=0
T is a functor and the adjoint to the forgetful functor from R-algebras to R-modules. That
is, every module homorphism M → S where S is a R-algebra, extends to a unique R-algebra
homorphism T(M) → S.
Introduction. Tensor arrays, or tensors for short 3 are multidimensional arrays with two
types of (covariant and contravariant) indices. Tensors are widely used in science and math-
ematics, because these data structures are the natural choice of representation for a variety
of important physical and geometric quantities.
3
The word tensor has other meanings, c.f. the tensor entry.
1065
In this entry we give the definition of a tensor array and establish some related terminology
and notation. The theory of tensor arrays incorporates a number of other essential topics:
basic tensors, tensor transformations, outer multiplication, contraction, inner multiplication,
and generalized transposition. These are fully described in their separate entries.
4
Valences and the space of tensors arrays. Let K be a field and let I be a finite list
of indices 5 , such as (1, 2, . . . , n). A tensor array of type
(p, q), p, q ∈ N
is a mapping
I p × I q → K.
The set of all such mappings will be denoted by Tp,q (I, K), or when I and K are clear from
the context, simply as Tp,q . The numbers p and q are called, respectively, the contravariant
and the covariant valence of the tensor array.
Point-wise addition and scaling give Tp,q the structure of a a vector space of dimension
np+q , where n is the cardinality of I. We will interpret I 0 as signifying a singleton set.
Consequently Tp,0 and T0,q are just the maps from, respectively, I p and I q to K. It is also
customary to identify T1,0 with KI , the vector space of list vectors indexed by I, and to
∗
identify T0,1 with dual space KI of linear forms on KI . Finally, T0,0 can be identified
with K itself. In other words, scalars are tensor arrays of zero valence.
We also mention that it is customary to use columns to represent contravariant index dimen-
sions, and rows to represent the covariant index dimensions. Thus column vectors are type
(1, 0) tensor arrays, row vectors are type (0, 1) tensor arrays, and matrices, in as much as
they can be regarded either as rows of columns or as columns of rows, are type (1, 1) tensor
arrays.7
4
In physics and differential geometry, K is typically R or C.
5
It is advantageous to allow general indexing sets, because one can indicate the use of multiple frames
of reference by employing multiple, disjoint sets of indices.
6
Curiously, the latter notation is preferred by some authors. See H. Weyl’s books and papers, for example.
7
It is also customary to use matrices to also represent type (2, 0) and type (0, 2) tensor arrays (The latter
are used to represent quadratic forms.) Speaking idealistically, such objects should be typeset, respectively,
as a column of column vectors and as a row of row vectors. However typographical constraints and notational
convenience dictate that they be displayed as matrices.
1066
Notes. It must be noted that our usage of the term tensor array is non-standard. The
traditionally inclined authors simply call thse data structures tensors. We bother to make
the distinction because the traditional nomenclature is ambiguous and doesn’t include the
modern mathematical understanding of the tensor concept. (This is explained more fully
in the tensor entry.) Precise and meaningful definitions can only be given by treating the
concept of a tensor array as distinct from the concept of a geometric/abstract tensor.
We also mention that the term tensor is often applied to objects that should more appro-
priately be termed a tensor field. The latter are tensor-valued functions, or more generally
sections of a tensor bundle. A tensor is what one gets by evaluating a tensor field at one
point. Informally, one can also think of a tensor field as a tensor whose values are functions,
rather than constants.
Definition. The classical conception of the tensor product operation involved finite dimensional
vector spaces A, B, say over a field K. To describe the tensor product A ⊗ B one was obliged
to chose bases
ai ∈ A, i ∈ I, bj ∈ B, j ∈ J
of A and B indexed by finite sets I and J, respectively, and represent elements of a ∈ A
and b ∈ B by their coordinates relative to these bases, i.e. as mappings a : I → K and
b : J → K such that X X
a= ai ai , b= bj bj .
i∈I j∈J
One then represented A ⊗ B relative to this particular choice of bases as the vector space of
mappings c : I × J → K. These mappings were called “second-order contravariant tensors”
and their values were customarily denoted by superscripts, a.k.a. contravariant indices:
cij ∈ K, i ∈ I, j ∈ J.
The canonical bilinear multiplication (also known as outer multiplication)
⊗:A×B →A⊗B
was defined by representing a ⊗ b, relative to the chosen bases, as the tensor
cij = ai bj , i ∈ I, j ∈ J.
In this system, the products
ai ⊗ bj , i ∈ I, j ∈ J
were represented by basic tensors, specified in terms of the Kronecker deltas as the mappings
0 0
(i0 , j 0) 7→ δii δjj , i0 ∈ I, j 0 ∈ J.
1067
These gave a basis of A ⊗ B.
The construction is independent of the choice of bases in the following sense. Let
a0i ∈ A, i ∈ I 0 , b0j ∈ B, j ∈ J 0
r : I × I 0 → K, s : J × J 0 → K,
for all i ∈ I, j ∈ J. This relation corresponds to the fact that the products
a0i ⊗ b0j , i ∈ I 0, j ∈ J 0
constitute an alternate basis of A ⊗ B, and that the change of basis relations are
X
a0i0 ⊗ b0j 0 = rii0 sjj 0 ai ⊗ bj , i0 ∈ I 0 , j 0 ∈ J 0 . (238.8.2)
i0 ∈I 0
j 0 ∈J 0
Notes. Historically, the tensor product was called the outer product, and has its origins in
the absolute differential calculus (the theory of manifolds). The old-time tensor calculus is
difficult to understand because it is afflicted with a particularly lethal notation that makes
coherent comprehension all but impossible. Instead of talking about an element a of a vector
space, one was obliged to contemplate a symbol ai , which signified a list of real numbers
indexed by 1, 2, . . . , n, and which was understood to represent a relative to some specified,
but unnamed basis.
What makes this notation truly lethal is the fact a symbol aj was taken to signify an alternate
list of real numbers, also indexed by 1, . . . , n, and also representing a, albeit relative to a
different, but equally unspecified basis. Note that the choice of dummy variables make all
the difference. Any sane system of notation would regard the expression
ai , i = 1, . . . , n
1068
However, in the classical system, one was strictly forbidden from using
a1 , a2 , . . . , an
because where, after all, is the all important dummy variable to indicate choice of basis?
Thankfully, it is possible to shed some light onto this confusion (I have read that this is
credited to Roger Penrose) by interpreting the symbol ai as a mapping from some finite
index set I to R, whereas aj is interpreted as a mapping from another finite index set J (of
equal cardinality) to R.
My own surmise is that the source of this notational difficulty stems from the reluctance of
the ancients to deal with geometric objects directly. The prevalent superstition of the age
held that in order to have meaning, a geometric entity had to be measured relative to some
basis. Of course, it was understood that geometrically no one basis could be preferred to any
other, and this leads directly to the definition of geometric entities as lists of measurements
modulo the equivalence engendered by changing the basis.
It is also worth remarking on the contravariant nature of the relationship between the actual
elements of A⊗B and the corresponding representation by tensors relative to a basis — com-
pare equations (1) and (2). This relationship is the source of the terminology “contravariant
tensor” and “contravariant index”, and I surmise that it is this very medieval pit of darkness
and confusion that spawned the present-day notion of “contravariant functor”.
References.
The present entry employs the terminology and notation defined and described in the entry
on tensor arrays and basic tensors. To keep things reasonably self contained we mention
that the symbol Tp,q refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p × I q → K,
where I is some finite list of index labels, and where K is a field. The symbols ε(i) , ε(i) , i ∈ I
refer to the column and row vectors giving the natural basis of T1,0 and T0,1 , respectively.
T : KI → KJ
1069
be a linear isomorphism. Every such isomorphism is uniquely represented by an invertible
matrix
M :J ×I →K
with entries given by j
M ji = T ε(i) , i ∈ I, j ∈ J.
In other words, the action of T is described by the following substitutions
X j
ε(i) 7→ M i ε(j) , i ∈ I. (238.9.1)
j∈J
The corresponding substitutions relations for the type (0, 1) tensors involve the inverse matrix
M −1 : I × J → K, and take the form 8
X i
ε(i) 7→ M −1 j ε(j) , i ∈ I. (238.9.2)
j∈J
The rules for type (0, 1) substitutions are what they are, because of the requirement that
the ε(i) and ε(i) remain dual bases even after the substitution. In other words we want the
substitutions to preserve the relations
where the left-hand side of the above equation features the inner product and the right-hand
side the Kronecker delta. Given that the vector basis transforms as in (238.9.1) and given
the above constraint, the substitution rules for the linear form basis, shown in (238.9.2), are
the only such possible.
Thus, we see that the “transformation rule” for indices is contravariant to the substitution
rule (238.9.1) for basis vectors.
8
The above relations describe the action of the dual homomorphism of the inverse transformation
∗ ∗ ∗
T −1 : KI → KJ .
1070
In modern terms, this contravariance is best described by saying that the dual space space
construction is a contravariant functor 9 . In other words, the substitution rule for the linear
forms, i.e. the type (0, 1) tensors, is contravariant to the substitution rule for vectors:
X j
ε(j) 7→ M i ε(i) , j ∈ J, (238.9.4)
i∈I
in full agreement with the relation shown in (238.9.2). Everything comes together, and
equations (238.9.3) and (238.9.4) are seen to be one and the same, once we remark that
tensor array values can be obtained by contracting with characteristic arrays. For example,
Finally we must remark that the transformation rule for covariant indices involves the inverse
matrix M −1 . Thus if α ∈ T0,1 (I) is transformed to a β ∈ T0,1 the indices will be related by
X i
βj = M −1 j αi , j ∈ J.
i∈I
The most general transformation rule for tensor array indices is therefore the following: the
indexed values of a tensor array X ∈ Tp,q (I) and the values of the transformed tensor array
Y ∈ Tp,q (J) are related by
j ...j
X j k kq i ...i
Y l11 ...lqp = M ji11 · · · M ipp M −1 l11 · · · M −1 lq Xk11 ...kpq ,
i1 ,...,ip ∈I p
k1 ,...,kq ∈I q
9
See the entry on the dual homomorphism.
1071
Chapter 239
The bac-cab rule states that for vectors A, B, and C (that can be either real or complex)
in R3 , we have
A × (B × C) = B(A · C) − C(A · B).
Here × is the cross product, and · is the real inner product.
The cross product of two vectors is a vector orthogonal to the plane of the two vectors
being crossed, whose magnitude is equal to the area of the parallelogram defined by the two
vectors. Notice there can be two such vectors. The cross product produces the vector that
would be in a right-handed coordinate system with the plane. It is exclusively for use in R3
as you can see from the definition. We write the cross product of the vectors ~a and ~b as
î ĵ k̂
~ a2 a3 a1 a3 a1 a2
~a × b = det a1 a2 a3 = det î − det ĵ + det k̂
b2 b3 b1 b3 b1 b2
b1 b2 b3
with ~a = a1 î + a2 ĵ + a3 k̂ and ~b = b1 î + b2 ĵ + b3 k̂, where {î, ĵ, k̂} is any right-handed basis
for R3 .
Note that ~a × ~b = −~b × ~a.
Properties of the Cross Product:
1072
• From the expression for the area of a parallelogram we obtain |~a ×~b| = |~a||~b| sin θ where
θ is the angle between ~a and ~b.
• From the above, you can see that the cross product of two parallel vectors, or a vector
with itself, would be 0, assuming neither vector is ~0 since sin 0 = 0.
A euclidean vector is an geometric entity that has the properties of magnitude and direction.
For
example,
a vector in R2 can be represented by its components like this (3, 4) or like this
3
. This particular vector can be represented geometrically like this:
4
In Rn , a vector can be easily constructed as the line segment between points whose difference
in each coordinate are the components of the vector. A vector constructed at the origin (The
vector (3, 4) drawn from the point (0, 0) to (3, 4)) is called a position vector. Note that a
vector that is not a position vector is independent of position. It remains the same vector
unless one changes its magnitude or direction.
Magnitude: The magnitude of a vector is the distance from one end of the line segment to
the other. The magnitude of the vector comes from the metric of the space it is embedded
in. It’s magnitude is also referred to as the vector norm. In Euclidean space, this can be
gotten using Pythagorean’s theorem in Rn such that for a vector ~v ∈ Rn , the length |~v | is
such that: v
u n
uX
|~v | = t x2n
i=0
√
This can also be found by the dot product ~a · ~a.
Direction: A vector basically is a direction. However you may want to know the angle
it makes with another vector. In this case, one can use the dot product for the simplest
computation, since if you have two vectors ~a , ~b,
~a · ~b = |~a||~b| cos θ
Since θ is the angle between the vectors, you just solve for θ. Note that if you want to find
the angle the vector makes with one of the axes, you can use trigonometry.
In this case, the length of the vector is just the hypotenuse of a single right triangle, and the
angle θ is just arctan xy , the arctangent of it’s components.
Projection, resolving a vector into its components:Say all we had was the magnitude
and angle of the vector to the x-axis and we needed the components.(Maybe we need to do
a cross product or addition). Look at figure a. again. The x component of the vector can be
likened to a ”plumb line” dropped from the arrowhead of the vector to the x-axis so that it
is perpendicular to the x-axis. Thus, to get the x-component, all we would have to do is
1073
multiply the magnitude of the vector by the cosine of θ. This is called resolving the vector
into its components.
Now say we had another vector at an angle θ2 to our vector. We construct a line between
the arrow of our vector and intercept the other vector in the same way, perpendicularly. The
length from the tail of the vector to our interception is called the projection of our vector
onto the other. Note that the equation remains the same. The is still d cos θ2 . This formula
is valid in all higher dimensions as the angle between two vectors takes place on a plane.
Miscellaneous Properties:
· Two vectors that are parallel to each other are called ”collinear”, as they can be drawn
onto the same line. Collinear vectors are scalar multiples of each other.
· Two non-collinear vectors are coplanar.
· There are two main types of products between vectors. The dot product (scalar product),
and the cross product (vector product). There is also a commonly used combination called
the triple scalar product.
Theorem
Let R be a rotational 3 × 3 matrix, i.e., a real matrix with det R = 1 and R−1 = RT . Then
for all vectors u, v in R3 ,
R · (u × v) = (R · u) × (R · v).
Proof. Let us first fix some right hand oriented orthonormal basis in R3 . Further, let
{u1 , u2, u3 } and {v 1 , v 2 , v 3 } be the components of u and v in that basis. Also, in the chosen
basis, we denote the entries of R by Rij . Since R is rotational, we have Rij Rkj = δik where
δik is the Kronecker delta symbol. Here we use the Einstein summation convention. Thus, in
the previous expression, on the left hand side, j should be summed over 1, 2, 3. We shall use
the Levi-Civita permutation symbol ε to write the cross product. Then the i:th coordinate
of u × v equals (u × v)i = εijk uj v k . For the kth component of (R · u) × (R · v) we then have
The last line follows since εijk Rim Rjn Rkr = εmnr εijk Ri1 Rj2 Rk3 = εmnr det R. Since det R =
1, it follows that
1074
as claimed. 2
1075
Chapter 240
240.1 contraction
REFERENCES
1. T. Frankel, Geometry of physics, Cambridge University press, 1997.
1076
240.2 exterior algebra
Let V be a vector space over a field K. The exterior algebra over V , commonly denoted by
Λ(V ), is an associative K-algebra with unit, together with a linear injection
λ : V → Λ(V ),
• The exterior product is anti-symmetric in the sense that for all v ∈ V we have
v ∧ v = 0.
φ:V →A
such that
φ(v)φ(v) = 0, v∈V
lifts to a unique K-algebra homomorphism
φ̃ : Λ(V ) → A
such that
φ(v) = φ̃(λ(v)), v ∈ V.
Diagramatically:
λ
V Λ(V )
φ ∃! φ̃
So much for the abstract definition. It is concise, but does little to illuminate the nature of
the elements of Λ(V ). To that end, let us say that an α ∈ Λ(V ) is k-primitive if there exist
v1 , . . . , vk ∈ V such that
α = v1 ∧ . . . ∧ vk ,
and say that an element of Λ(V ) is k-homogeneous if it is the linear combination of k-
primitive elements. We then define Λ0 (V ) to be the span of the unit element, define Λ1 (V )
to be the image of canonical injection λ, and for k > 2 define Λk (V ) to be the vector space
of all k-homogeneous elements of Λ(V ).
1077
Proposition 12. The above grading gives Λ(V ) the structure of anti-symmetrically N-graded
algebra. To be more precise,
M∞
Λ(V ) = Λk (V ),
k=0
α ∧ β ∈ Λj+k (V )
with
β ∧ α = (−1)jk α ∧ β.
Proof. The essence of the proof is that we construct a model of the exterior algebra, and
then use the universality properties of Λ(V ) to show that it is isomorphic to that model. To
that end, for k ∈ N let V ⊗k denote the k-fold tensor product of V with itself. Note: it must
be understood that V ⊗0 = K and that V ⊗1 = V . For k > 2 we let Rk be the span of all
elements
v1 ⊗ . . . ⊗ vk ∈ V ⊗k ,
such that
vj = vj+1
for some j = 1, . . . , k − 1 We then set
E k = V ⊗k /Rk
and set ∞
M
E= Ek.
k=0
⊗ : V ⊗j × V ⊗k → V ⊗(j+k)
factors through to the quotient and gives a well-defined associative, anti-symmetric product
on E. From there, the universality properties of the tensor product and the universality
properties of Λ(V ) imply that E and Λ(V ) are isomorphic algebras. Q.E.D.
In the case that V is a finite-dimensional vector space one can give some more “down-to-
earth” definitions that may be helpful in understanding the nature of the exterior product.
Suppose then, that V is n-dimensional, and let V ∗ denote the dual space of linear forms.
Note that (V ∗ )⊗k is naturally identified with the vector space of multi-linear mappings
V k → R. It therefore makes sense to identify Λk (V ∗ ) with the vector space of anti-symmetric,
multi-linear mappings V k → R. Next, we define the alternation operator
Ak : (V ∗ )⊗k → Λk (V ∗ )
1078
as follows. For α ∈ (V ∗ )⊗k we define Ak (α) ∈ Λk (V ∗ ) by
1 X
Ak (α)(v1 , . . . , vk ) = sgn(π)α(vπ1 , . . . , vπk ), v1 , . . . , vk ∈ V
k! π
where the right-hand sum is taken over all permutations of {1, . . . , k}, and sgn(π) = ±1
according to the parity of the permutation. Let us also note that Ak restricts to the identity
on Λk (V ∗ ). Finally for α ∈ Λj (V ∗ ), and β ∈ Λk (V ∗ ) we define
α ∧ β = Aj+k (α ⊗ β),
A description of the exterior algebra in terms of a basis may also be useful. Therefore, let
e1 , . . . , en be a basis of V . For every sequence I = (i1 , . . . , ik ) of natural numbers between 1
and n let eI denote the primitive element ei1 ∧ . . . ∧ eik . If I is the empty sequence, we use
eI to denote the unit element of the exterior algebra. Note that
eI = 0
if and only if I contains duplicate entries. For a permutation π of {1, . . . , k} let π(I) denote
the sequence (πi1 , . . . , πik ), and note that
eπ(I) = sgn(π)eI .
Proposition 14. The exterior algebra Λ(V ) is a 2n dimensional vector space with basis {eI },
where the index I runs over all strictly increasing sequences — including the empty sequence
— of natural numbers between 1 and n.
The upshot of all this is that for finite-dimensional vector spaces we have another way to
construct a model of the exterior algebra. Namely, we choose a basis e1 , . . . , en an define
formal bi-vector symbols ei ∧ ej subject to the anti-symmetric relations
ei ∧ ei = 0, and ei ∧ ej = −ej ∧ ei .
We then define tri-vector symbols, and more generally k-vector symbols subject to the ob-
vious k-place anti-symmetric relations. The exterior algebra Λ(V ) is then defined to be the
vector space of all possible linear combinations of the k-vector symbols, and the algebra
product is defined by linearly extending the wedge product to all of Λ(V ). Also note that
for k > n all k-vector symbols are identified with zero, and hence that Λk (V ) = {0} for all
k > n.
1079
Notes. The exterior algebra is also known as the Grassmann algebra after its inventor
Hermann Grassmann who created it in order to give algebraic treatment of linear geometry.
Grassman was also one of the first people to talk about the geometry of an n-dimensional
space with n an arbitrary natural number. The axiomatics of the exterior product are needed
to define differential forms and therefore play an essential role in the theory of integration
on manifolds. Exterior algebra is also an essential prerequisite to understanding DeRham’s
theory of differential cohomology.
1080
Chapter 241
The Kronecker delta δij is defined as having value 1 when i = j and 0 otherwise (i and j are
integers). It may also be written as δ ij or δji . It is a special case of the generalized Kronecker delta symbol.
Example.
The n × n identity matrix I can be written in terms of the Kronecker delta as simply the
matrix of the delta, Iij = δij , or simply I = (δij ).
REFERENCES
1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and
Applied Mathematics, 1998.
Let V be a vector space over a field k. The dual of V , denoted by V ? , is the vector space of
linear forms on V , i.e. linear mappings V → k. The operations in V ? are defined pointwise:
1081
(λϕ)(v) = λϕ(v)
?
for λ ∈ K, v ∈ V and ϕ, ψ ∈ V .
V is isomorphic to V ? if and only if the dimension of V is finite. If not, then V ? has a larger
(infinite) dimension than V ; in other words, the cardinal of any basis of V ? is strictly greater
than the cardinal of any basis of V .
βi : V → k
by X
βi ( xk bk ) = xi .
k
It is easy to see that the βi are nonzero elements of V ? and are independent. Thus
{β1 , . . . , βn } is a basis of V ? , called the dual basis of B.
The dual of V ? is called the second dual or bidual of V . There is a very simple canonical
injection V → V ?? , and it is an isomorphism if the dimension of V is finite. To see it, let x
be any element of V and define a mapping x0 : V ? → k simply by
x0 (φ) = φ(x) .
Remarks
Another way in which a linear mapping V → V ? can arise is via a bilinear form
V ×V →k .
The notions of duality extend, in part, from vector spaces to modules, especially free modules
over commutative rings. A related notion is the duality in projective spaces.
1082
241.3 example of trace of a matrix
2 4 6 9 8 7
Let A = 8 10 12 and B = 6 5 4 then:
14 16 18 3 2 1
• trace(A + B)
= trace(A) + trace(B)
= (2 + 10 + 18) + (9 + 5 + 1)
= 45
• trace(A)
= trace(cA0 )
0
= c · trace(A
)
1 2 3
= 2 · trace 4 5 6
7 8 9
= 2 · (1 + 5 + 9)
= 30
Let l and n be natural numbers such that 1 ≤ l ≤ n. Further, let ik and jk be natural numbers
in {1, · · · , n} for all k in {1, · · · , l}. Then the generalized Kronecker delta symbol,
denoted by δji11···j
···il
l
, is zero if ir = is or jr = js for some r 6= s, or if {i1 , · · · , il } =
6 {j1 , · · · , jl }
as sets. If none of the above conditions are met, then δji11···j ···il
l
is defined as the sign of the
permutation that maps i1 · · · il to j1 · · · jl .
From the definition, it follows that when l = 1, the generalized Kronecker delta symbol
reduces to the traditional delta symbol δji . Also, for l = n, we obtain
δij11··· in
···jn = ε
i1 ···in
εj1 ···jn ,
1··· n
δj1 ···jn = εj1 ···jn ,
For any l we can write the generalized delta function as a determinant of traditional delta
1083
symbols. Indeed, if S(l) is the permutation group of l elements, then
X i i
δij11···i
···jl =
l
sign τ δjτ1 (1) · · · δjτl (l)
τ ∈S(l)
δji11 · · · δjil1
.. .
= det ... ..
. .
i1
δjl · · · δjill
The first equality follows since the sum one the first line has only one non-zero term; the
term for which iτ (k) = jk . The second equality follows from the definition of the determinant.
The collection of all linear functionals on V can be made into a vector space by defining
addition and scalar multiplication pointwise; it is called the dual space of V.
A module is the natural generalization of a vector space, in fact, when working over a field
it is just another word for a vector space.
Note as mentioned in the beginning, if R is a field, these properties are the defining properties
for a linear transformation.
Similarly in vector space terminology the image Imf := {f (x) : x ∈ M} and kernel Kerf :=
{x ∈ M : f (x) = 0N } are called the range and null-space respectively.
1084
241.7 proof of properties of trace of a matrix
Proof of properties:
n
X
trace(cA) = c · ai,i (property of matrix scalar multiplication)
i=1
n
X
= c· ai,i (property of sums)
i=1
= c · trace(A).
2. The second property follows since the transpose does not alter the entries on the main
diagonal.
3. The proof of the third property follows by exchanging the summation order. Suppose
A is a n × m matrix and B is a m × n matrix. Then
n X
X m
trace AB = Ai,j Bj,i
i=1 j=1
Xm X n
= Bj,iAi,j (changing summation order)
j=1 i=1
= trace BA.
4. The last property is a consequence of Property 3 and the fact that matrix multiplication
is associative;
trace(B −1 AB) = trace (B −1 A)B
= trace B(B −1 A)
= trace (BB −1 )A
= trace(A).
1085
241.8 quasipositive matrix
Definition
Let A = (ai,j ) be a (real or possibly complex) square matrix of dimension n. The trace of
the matrix is the sum of the main diagonal:
P
n
trace(A) = ai,i
i=1
properties:
1. The trace is a linear transformation from the space of square matrices to the real
numbers. In other words, if A and B are square matrices of same dimension and c is
a scalar, then
2. For the transpose and conjugate transpose, we have for any square matrix A,
trace(At ) = trace(A),
trace(A∗ ) = trace(A).
trace(AB) = trace(BA).
1086
5. Let A be a square n × n matrix with real (or complex) entries aij . Then
REFERENCES
1. The Trace of a Square Matrix. Paul Ehrlich, [online]
http://www.math.ufl.edu/ ehrlich/trace.html
2. Z.P. Yang, X.X. Feng, A note on the trace inequality for products of Hermitian matrix
power, Journal of Inequalities in Pure and Applied Mathematics, Volume 3, Issue 5, 2002,
Article 78, online.
1087