Sei sulla pagina 1di 2

Introduction to Information Retrieval (CS 121 / Inf 141)

Quiz #3 Permutation A - 05/23/2017 WITH ANSWERS

Topics: Boolean Retrieval, Ranked Retrieval and Vector Space Model

Name __________________________________________________________________________________

Student ID______________________________________________________________________________

This exam is individual, closed-book and closed-notes.


If taking the online version: during the quiz, you are not allowed to use other
programs or visit sites other than the quiz page on your Canvas session.
If taking the paper version: you are only allowed to use this sheet (both
sides) and return it with your answers. No scratch paper is allowed.

Multiple Choice Questions: Please choose only one answer per question

Q1 Imagine you have a collection of a million documents (N) with an average of


1,000 words per document and a total of M=500,000 terms (unique words). Which
of the following statement is false regarding its Term-Document Incidence Matrix?
The matrix would be extremely sparse (most entries would be 0).
The matrix would consist of a distribution of 0 and 1 with dimension M by N.
The matrix shows the term frequency (tf) of each term in each document.
Each column (vector) shows which terms are present in each document.

Q2 Which of the following statements is false with regards to Boolean Retrieval


model?
It answers queries based on Boolean expressions (AND, OR and NOT).
It views documents as a set of terms.
It is very precise, as its queries need to meet a very specific condition.
It cannot combine two operators, such as AND NOT or OR NOT

Q3 Select the most efficient processing order for the Boolean query Q.
Q: trees AND marmalade AND eyes. Term Doc. Freq
(marmalade AND eyes) first, then merge with trees. eyes 213,312
(marmalade AND trees) first, then merge with eyes. marmalade 107,913
(trees AND eyes) first, then merge with marmalade. trees 316,812
Any combination would result in the same amount of operations.

Q4 Which of the following statements is false regarding the Boolean Retrieval


model?
It does not perform query spell checking.
It does not capture information about term position in the documents.
It does not consider document structure (zones in documents such headers).
It considers term frequency information to rank results.

Page 1/2
Introduction to Information Retrieval (CS 121 / Inf 141)
Quiz #3 Permutation A - 05/23/2017 WITH ANSWERS

Q5 Which of the following statements is false regarding the Ranked Retrieval


model?
It returns an ordering over the (top) documents in the collection for a query.
It accepts free text queries as input (one or more words in a human language).
It works better (easier to use) than Boolean models for most users.
Large result sets are an issue in Ranked Retrieval as we overwhelm users.

Q6 Find the Jaccard coefficient (Jc) for the query and documents below.
Query: top university (set q) Doc 1: university of California (set d1)
Doc 2: best university in USA (set d2)
Jc(q,d1)=1/4, Jc(q,d2)=1/5 Jc(q,d1)=0, Jc(q,d2)=1/6
Jc(q,d1)=1/5, Jc(q,d2)=1/6 Jc(q,d1)=1/5, Jc(q,d2)=0

Q7 Which of the following statements is false with regards to the Term-document


Count Matrix of a set of M terms in a collection of N documents?
Each document is a count vector of dimension M consisting of natural numbers.
The term-document Count Matrix considers term frequency.
The term-document Count Matrix considers the position of terms in a document.
This Term-document Count Matrix is also known as bag of words model.

Q8 Mark the false statement with regards to the term frequency (tf)?
The tf is the number of times that a term occurs in a document.
Relevance of a term in a document increases proportionally with its tf.
The tf of a query is the sum of the tf of each of the terms in the query.
The tf of a query is 0 if none of the query terms is present in the document.

Q9 Mark the false statement with regards to the document frequency (df)?
Rare terms are more informative than frequent terms.
The df of a term t can be found as the length of the posting list of t.
Frequent terms are more informative than rare terms.
The df of a term t refers to the number of documents that contain t.

Q10 Which of the following statements is false with regards to the Vector Space
Similarity?
Terms are axes of the space, which results in a high-dimensional space.
Documents and queries can be presented as points or vectors in the space.
The Euclidean distance query-document is a good approach to rank its similarity.
Documents can be ranked according to their proximity to the query in the space.

Page 2/2

Potrebbero piacerti anche