Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Inverted index
Query optimization
2015-02-18
1 / 69
Introduction
Inverted index
Query optimization
Take-away
2 / 69
Introduction
Inverted index
Query optimization
Teachers
4 / 69
Introduction
Inverted index
Query optimization
5 / 69
Introduction
Inverted index
Query optimization
6 / 69
Introduction
Inverted index
Query optimization
7 / 69
Introduction
Inverted index
Query optimization
8 / 69
Introduction
Inverted index
Query optimization
1998: google.stanford.edu
12 / 69
Introduction
Inverted index
Query optimization
Boolean retrieval
16 / 69
Introduction
Inverted index
Query optimization
17 / 69
Introduction
Inverted index
Query optimization
19 / 69
Introduction
Inverted index
Query optimization
20 / 69
Introduction
Inverted index
Query optimization
Julius
Caesar
The
Tempest
Hamlet
Othello
Macbeth
...
Anthony
1
0
0
0
1
Brutus
1
0
1
0
0
Caesar
1
0
1
1
1
1
0
0
0
0
Calpurnia
Cleopatra
0
0
0
0
0
mercy
0
1
1
1
1
worser
0
1
1
1
0
...
Entry is 1 if term occurs. Example: Calpurnia occurs in Julius Caesar.
Entry is 0 if term doesnt occur. Example: Calpurnia doesnt occur in The
tempest.
21 / 69
Introduction
Inverted index
Query optimization
Incidence vectors
22 / 69
Introduction
Inverted index
Query optimization
Anthony
Brutus
Caesar
Calpurnia
Cleopatra
mercy
worser
...
result:
Anthony
and
Cleopatra
1
1
1
0
1
1
1
Julius
Caesar
The
Tempest
Hamlet
Othello
Macbeth
1
1
1
1
0
0
0
0
0
0
0
0
1
1
0
1
1
0
0
1
1
0
0
1
0
0
1
1
1
0
1
0
0
1
0
...
23 / 69
Introduction
Inverted index
Query optimization
Answers to query
24 / 69
Introduction
Inverted index
Query optimization
Bigger collections
25 / 69
Introduction
Inverted index
Query optimization
26 / 69
Introduction
Inverted index
Query optimization
Inverted Index
For each term t, we store a list of all documents that contain t.
Brutus
11
31
45
173
174
Caesar
16
57
132
Calpurnia
31
54
101
...
..
.
|
{z
dictionary
{z
postings
27 / 69
Introduction
Inverted index
Query optimization
28 / 69
Introduction
Inverted index
Query optimization
29 / 69
Introduction
Inverted index
Query optimization
Generate postings
term docID
i
1
did
1
enact
1
julius
1
caesar
1
i
1
was
1
killed
1
i
1
the
1
capitol
1
brutus
1
killed
1
me
1
so
2
let
2
it
2
be
2
with
2
caesar
2
the
2
noble
2
brutus
2
hath
2
told
2
you
2
caesar
2
was
2
ambitious
2
30 / 69
Introduction
Inverted index
Query optimization
Sort postings
term docID
i
1
did
1
enact
1
julius
1
caesar
1
i
1
was
1
killed
1
i
1
the
1
capitol
1
brutus
1
killed
1
me
1
so
2
let
2
it
2
be
2
with
2
caesar
2
the
2
noble
2
brutus
2
hath
2
told
2
you
2
caesar
2
was
2
ambitious
2
term docID
ambitious
2
be
2
brutus
1
brutus
2
capitol
1
caesar
1
caesar
2
caesar
2
did
1
enact
1
hath
1
i
1
i
1
i
1
it
2
julius
1
killed
1
killed
1
let
2
me
1
noble
2
so
2
the
1
the
2
told
2
you
2
was
1
was
2
with
2
31 / 69
Introduction
Inverted index
Query optimization
postings lists
2
2
1 2
1
1 2
1
1
2
1
1
2
1
1
2
1
2
2
1 2
2
2
1 2
2
32 / 69
Introduction
Inverted index
Query optimization
Brutus
11
31
45
173
174
Caesar
16
57
132
Calpurnia
31
54
101
...
..
.
|
{z
dictionary
{z
postings file
33 / 69
Introduction
Inverted index
Query optimization
34 / 69
Introduction
Inverted index
Query optimization
36 / 69
Introduction
Inverted index
Query optimization
Brutus
1 2 4 11 31 45 173 174
Calpurnia
2 31 54 101
Intersection
2 31
37 / 69
Introduction
Inverted index
Query optimization
38 / 69
Introduction
Inverted index
Query optimization
1 2 3 4 5 7 8 9 11 12 13 14 15
paris
2 6 10 12 14
lear
12 15
39 / 69
Introduction
Inverted index
Query optimization
Boolean queries
The Boolean retrieval model can answer any query that is a
Boolean expression.
Boolean queries are queries that use and, or and not to join
query terms.
Views each document as a set of terms.
Is precise: Document matches condition or not.
40 / 69
Introduction
Inverted index
Query optimization
41 / 69
Introduction
Inverted index
Query optimization
42 / 69
Introduction
Inverted index
Query optimization
Westlaw: Comments
43 / 69
Introduction
Inverted index
Query optimization
Query optimization
45 / 69
Introduction
Inverted index
Query optimization
Query optimization
1 2 4 11 31 45 173 174
Calpurnia
2 31 54 101
Caesar
5 31
46 / 69
Introduction
Inverted index
Query optimization
Intersect(ht1 , . . . , tn i)
1 terms SortByIncreasingFrequency(ht1 , . . . , tn i)
2 result postings(first(terms))
3 terms rest(terms)
4 while terms 6= nil and result 6= nil
5 do result Intersect(result, postings(first(terms)))
6
terms rest(terms)
7 return result
47 / 69
Introduction
Inverted index
Query optimization
48 / 69
Introduction
Inverted index
Query optimization
50 / 69
Introduction
Inverted index
Query optimization
51 / 69
Introduction
Inverted index
Query optimization
bo
aboard
about
or
border
rd
aboard
ardent
lord
boardroom border
morbid
sordid
boardroom border
52 / 69
Introduction
Inverted index
Query optimization
splits
assign
master
assign
parser
parser
parser
map
phase
segment
files
postings
inve rter
a-f
inve rter
g-p
inve rter
q-z
reduce
phase
53 / 69
Introduction
Inverted index
Query optimization
log10 cf
log10 rank
Zipfs law
54 / 69
Introduction
Inverted index
Query optimization
55 / 69
Introduction
Inverted index
Query optimization
Parsing
Linguistics
Documents
Document
cache
Metadata in
zone and
field indexes
user query
Results
page
Spell correction
Inexact
Tiered inverted
top K
positional index
retrieval
Indexes
k-gram
Scoring
parameters
MLR
training
set
56 / 69
Introduction
Inverted index
Query optimization
57 / 69
Introduction
Inverted index
Query optimization
58 / 69
Introduction
Inverted index
Query optimization
q1
w
STOP
the
a
frog
P(w |q1 )
0.2
0.2
0.1
0.01
w
toad
said
likes
that
...
P(w |q1 )
0.01
0.03
0.02
0.04
...
59 / 69
Introduction
Inverted index
Query optimization
60 / 69
Introduction
Inverted index
Query optimization
X
X
X
X
X
61 / 69
Introduction
Inverted index
Query optimization
62 / 69
Introduction
Inverted index
Query optimization
63 / 69
Introduction
Inverted index
Query optimization
http://news.google.com
64 / 69
Introduction
Inverted index
Query optimization
65 / 69
Introduction
Inverted index
Query optimization
66 / 69
Introduction
Inverted index
Query optimization
67 / 69
Introduction
Inverted index
Query optimization
Take-away
68 / 69
Introduction
Inverted index
Query optimization
Resources
Chapter 1 of IIR
Resources at http://www.fi.muni.cz/~sojka/PV211/ and
http://cislmu.org, materials in MU IS and FI MU library
course schedule and overview
information retrieval links
Shakespeare search engine
http://www.rhymezone.com/shakespeare/
69 / 69