Sei sulla pagina 1di 40

Project 4

Orakle
Time due: 9:00 PM Thursday, March 13


Introduction......................................................................................................................... 2
Anatomy of a Database....................................................................................................... 2
What Do You Need to Do?................................................................................................. 5
What Will We Provide?...................................................................................................... 5
The Tokenizer Class ....................................................................................................... 6
The HTTP Class (aka But I don't know how to use C++ to access the Internet!) .......... 7
Details: The Classes You Must Write................................................................................. 8
MultiMap and MultiMap::Iterator .................................................................................. 8
The MultiMap Class ................................................................................................. 10
The MultiMap::Iterator Class ................................................................................... 13
MultiMap Implementation Notes.............................................................................. 15
Database........................................................................................................................ 18
The Data Structures Used in a Database................................................................... 19
Our Testing Framework.................................................................................................... 34
The file Command and the url Command..................................................................... 35
The schema Command.................................................................................................. 35
The add Command........................................................................................................ 36
Issuing a Query: the qparam, sparam and execute commands ..................................... 36
Requirements and Other Thoughts ................................................................................... 38
What to Turn In................................................................................................................. 39
Grading ............................................................................................................................. 40
2

Introduction

The NachenSmall Software Corporation has decided to get into the database business and
create their own database offering to compete against Oracle and Microsoft. Given that
the NachenSmall leadership team consists entirely of UCLA alums, theyve decided to
offer the job to build this new database to the students of CS32. Lucky you!

So, in your last project for CS32 Winter 2014, your goal is to build a simple set of C++
classes that can be used to store and search through (aka query) large amounts of data.
If youre able to prove to NachenSmalls reclusive and bizarre CEO, Carey Nachenberg,
that you have the programming skills to build the simple database described in this
specification, hell hire you to build the complete project, and youll be rich and famous.

Anatomy of a Database

A database is a piece of software that stores one or more data records. Each data record
contains all of the known information about a single entity (e.g., a student, a customer,
etc.). The database lets a user search through its many records efficiently in order to find
records that match specific criteria (e.g., show me all students with a last name of Smith
with a GPA of 3.0 or higher). Other software applications (e.g., a website like
my.ucla.edu) typically use such a database to hold their data.

So what is a data record? A data record is a group of related fields about a single entity.
For example, if we want our database to store student data records, each student data
record might contain the following fields: a first name field, a last name field, a student
ID number field, a phone number field, and a GPA field. Here are some examples of
student data records:

Record #1: Carey,Nachenberg,102030405,310-825-4321,3.71
Record #2: David,Smallberg,304454123,818-666-2323,1.25
Record #3: David,Copperfield,987654321,424-750-7519,3.99

As you can see, each data record (also called a row) holds a first name field, a last name
field, a student ID number field, a phone number field, and a GPA field. So we can say
that our database holds multiple rows of data, and each row is composed of the same five
fields (first name, last name, student ID, phone number, GPA).

When you create a new database from scratch, you must specify a schema that describes
what types of fields you will be storing in each data record of the database. For example,
if you wanted to create a database that holds student data records, you would provide the
following schema to the database:

3
firstName: not indexed
lastName: indexed
studentID: not indexed
phoneNum: not indexed
GPA: indexed

The schema above states that were going to store five fields of information in each data
record/row of the database: a first name, last name, student ID, phone number and GPA.
The schema also describes whether each field needs to be indexed or not. If a field is
indexed, it means that the database enables the user to efficiently search for records based
on the contents of that field across all of the data records. So, given the schema above
which specifies that the lastName and GPA fields must be indexed, the database would
let the user efficiently search for all users with a lastName of Ziggy. However, the
database would not let the user search for students by their phoneNumber, since this field
was not designated as indexed.

Once you have specified a schema for a database, you may then add one or more data
records to the database that match that schema. So, for example, after specifying the
schema above, we could insert the following rows into our database.

Row #1: Corey,Wang,100200300,818-555-1212,3.62
Row #2: David,Smallberg,304454123,310-666-2323,1.25
Row #3: David,Copperfield,987654321,424-750-7519,1.46
Row #4: Jill,Bachelor,3453453356,626-999-1111,3.30
Row #5: Cindy,Wang,005393222,310-555-4545,2.00
Row #6: Buford,Wang,656999332,909-678-4567,1.80
Row #7: Rick,Ronzoni,676767545,310-666-2323,1.75
Row #8: Abel,Salfo,404932223,202-342-2342,1.25
Row #9: Joe,Smith,000000001,452-332-9492,1.99
Row #10:Bill,Smith,003004005,818-885-6735,1.99


Notice that it is possible that different data records contain the same field value records
#2 and #3 both have firstName field values of David for example, and records #1, #5
and #6 have the same value of Wang for their lastName field. This makes sense: You
wouldnt expect that people have unique first names (or last names or GPAs).

Of course, a database doesnt just hold lots of data. It also enables the user to search
through that data in order to find data records that match the users criteria. In database
lingo, this is called querying the database. So once we have added one or more data
records to the database, we may then query the database about these data records.

A database query, for the purposes of this assignment, has two parts:

1. Each query specifies a list of field names followed by minimum and maximum
acceptable values for each field name.
4
2. Each query specifies how to order/sort the data records that match your querys
criteria when those matching data records are returned to the user.

Here is an example query (written in pseudo-code for clarity):

Fields to match:
lastName,Ronson,Wang
GPA,1.0,2.0
Ordering criteria:
GPA, descending
lastName, ascending
firstName, ascending

The above query indicates that the user wants to find all users whose last names are
between Ronson and Wang, inclusive (e.g., [Ronson,Wang]), *and* whose GPA is
between 1.0 and 2.0, inclusive (e.g., [1.0,2.0]).

Further, the query specifies that any matching data records must be returned to the user in
a specific order. The results must first be ordered by the GPA field (in descending order
2.0 would come before 1.5, etc.). For those data records with exactly matching GPA
values, the results must secondarily be ordered by their lastName field in ascending order
(e.g., Branson comes before Coldwell). Finally, for those data records with the same GPA
and the same last name, they should further be ordered by the firstName field in
ascending order. So for the above query, the database would output the following data
records in the following order:

Row #5: Cindy,Wang,005393222,310-555-4545,2.00
Row #10: Bill,Smith,003004005,818-885-6735,1.99
Row #9: Joe,Smith,000000001,452-332-9492,1.99
Row #6: Buford,Wang,656999332,909-678-4567,1.80
Row #7: Rick,Ronzoni,676767545,310-666-2323,1.75
Row #8: Abel,Salfo,404932223,202-342-2342,1.25
Row #2: David,Smallberg,304454123,310-666-2323,1.25

As you can see, these seven returned rows all have a lastName field value between
Ronson and Wang, inclusive, and have GPA values between 1.0 and 2.0, inclusive.
Data records that dont meet BOTH of these criteria are absent from our query results.
For example, David Copperfield has a GPA of 1.46, which is between 1.0 and 2.0.
However Davids last name of Copperfield does not fall between Ronson and Wang, so
Davids record has been omitted from the results.

Notice that the rows have been ordered primarily by their GPA field, in descending order
(with the highest GPA of 2.0 at the top, and the lowest GPA of 1.25 at the bottom).
Secondarily, these records have been ordered by their lastName field, in ascending order.
So notice that both Abel Salfo and David Smallberg both had matching GPAs of 1.25
therefore, they were secondarily ordered by their last name, with Salfo coming before
5
Smallberg in the results. Finally, notice that two of our users have the same GPA of 1.99.
These two data records also have the same value for the lastName field (of Smith).
Therefore, these records have been tertiarily ordered by their firstName field, in
ascending order, with Bill coming before Joe.

For the purposes of this assignment, the user may specify search criteria (e.g., find data
records with a GPA of between 1.0 and 2.0) only on fields that have been designated as
indexed in the schema. So in the above example, the user could specify search criteria
referencing only the lastName and the GPA fields.

The user may, however, order their search results (i.e., matching records) by any field(s)
they like, regardless of whether the fields have been indexed.

What Do You Need to Do?

So, at a high level, what do you need to build?

You need to build a class called Database:
1. You need to be able to provide a schema to a database.
2. You need to be able to add one or more data records to a database, either
retrieving these rows of data from a website on the Internet, or letting the user add
them one at a time locally.
3. You need to be able to issue a query to the database and obtain a collection of
records that match the querys search criteria, ordered in a manner consistent with
the querys sorting criteria.

You need to build a class called MultiMap, representing a collection of key/value
associations; this class must be implemented using a binary search tree.
1. You can add a new item to a MultiMap
2. You can search for items in a MultiMap, getting a MultiMap::Iterator indicating a
matching item.

You need to build an iterator class called MultiMap::Iterator:
1. You can access the key/value association that is indicated by a
MultiMap::Iterator.
2. You may advance an iterator forward or backward through its MultiMap.
3. You may check if an iterator is valid.

What Will We Provide?

Well provide a simple main.cpp file and a test.h file that brings your entire program
together. The test.h file includes a test framework that will help you test your classes as
you build them.

6
Well provide an HTTP class that can be used to download a web page from a web server
on the Internet (e.g., from http://reddit.com). If you specify the URL for a page, it will
download the contents of the page and place them into a string. The section on the HTTP
class below has details. You will use this class to import record data (e.g., a list of student
data records) from a remote website over the Internet. You must NOT modify this class
in any way.

Finally, well provide you with a class called Tokenizer to help you break apart strings
and separate them (e.g., break aaa,bbb,ccc into aaa, bbb, and ccc). You may use
this class anywhere you like in your project, but you must NOT modify it.

The Tokenizer Class

We provide a Tokenizer class for you to use in your program to simplify the process of
tokenizing strings. Tokenizing is the process of breaking up a string that is divided by a
set of delimiters into a succession of smaller strings. A delimiter is typically a dividing
character like a space, period, comma or other punctuation mark.

The Tokenizer class can be used to chop up a provided string into parts, with each part
separated by delimiters that you specify.

Here is the class declaration:

!"#$$ &'()*+,)-
.
/01"+!2
&'()*+,)-3!'*$4 $4522$4-+*67 4)849 $4522$4-+*6 5)"+:+4)-$;<
1''" 6)4=)84&'()*3$4522$4-+*67 4'()*;<
><

Heres how you might use the class:

void bar()
{
std::string delimiters = " ,.?"; // space, comma, period, question
std::string tokenizeMe = "This isnt a test, is it? Really!";

Tokenizer t(tokenizeMe, delimiters);
string w;
while (t.getNextToken(w))
{
cout << "token: " << w << endl;
}
}

This function would write:

token: This
token: isnt
7
token: a
token: test
token: is
token: it
token: Really!

You may use this class anywhere in your program where tokenization is required. This
class is defined in the Tokenizer.h file (which we provide for your use !).

The HTTP Class (aka But I don't know how to use C++ to access the Internet!)

Oh, we knew you were going to say that! Such a whiner! But wouldnt you like to learn
how to write a program that interacts with other computers over the Internet? We thought
so. So were going to provide you with a reasonably functional Internet HTTP interface
that lets you download pages from the Internet. HTTP is the protocol used by web
browsers to download web pages from servers on the Internet into your browser.

When you use our interface, you dont have to worry about the details of how to
communicate over the Internet yourself. Of course, if you want to see how our interface
works, youre welcome to do so and before you know it, youll be forming your own
start-up Internet company to compete against Google
1
. Our HTTP interface's primary
public function (get) is as easy to use as this:

#include "http.h"

int main()
{
string url = "http://en.wikipedia.org/wiki/Bald";
string page; // to hold the contents of the web page

// The next line downloads a web page for you. So easy!
if (HTTP().get(url, page))
cout << page; // prints the pages data out
else
cout << "Error fetching content from URL " << url << endl;
...
}

Note that you dont need to declare an HTTP variable. The call above looks as if it calls a
function named HTTP, then calls a get() member function on what it returns.

A challenge when testing a program that analyzes the contents of real web pages is that
you often have no control over those contents. Our HTTP interface lets you set up a
pseudo-web of pages with URLs and contents of your choosing:


1
By agreeing to use our HTTP code for Project 4, this license entitles NachenSmall to a 20% cut of all
profits.
8
int main()
{
HTTP().set("http://a.com", "This is a test page.");
HTTP().set("http://b.com", "Here is another.");
HTTP().set("http://c.com", "<html>Everyone loves CS 32</html>");
string page;
if (HTTP().get("http://b.com", page))
cout << page << endl; // writes Here is another.
}

You call set() to associate a URL with a string. From that point on, calling get() with that
URL will retrieve that string. (Once you call set(), get() will no longer retrieve pages
from the real web; it will instead consult only the pages that you installed with set().)

Details: The Classes You Must Write

You must write correct versions of the following classes to obtain full credit on this
project. Your classes must work correctly with our other provided classes and you must
NOT modify our provided classes to make them work with your code.

MultiMap and MultiMap::Iterator

You must implement a class named MultiMap that represents the concept of a multimap
from string to unsigned int, and a class MultiMap::Iterator that is an iterator for
MultiMaps. A multimap is very much like a map, except that duplicate keys are allowed.

A key/value pair (also called an association) consists of two items: a key (which for
MultiMap will be a string), and a value (which for MultiMap will be an unsigned int).
A multimap is a collection of associations. Unlike a map, more than one association may
have the same key. In fact, there can be more than one association with the same key and
value. As an example, a multimap could contain these ten associations:

Joe " 1
Bill " 2
Carey " 5
Bill " 3
James " 4
Larry " 9
Andrea " 6
Larry " 7
Bill " 8
Bill " 2

Notice that the key "Andrea" is associated with the value 6, the key "Larry" is associated
with the values 7 and 9 (the order of values associated with the same key doesn't matter
to us), and the key "Bill" is associated with 8, 2, 2, and 3.

Consider a sequence consisting of associations in a multimap. We say that sequence is
ordered if the keys of each association in the sequence are in order according to the less
than relation (the < operator): If k1 ! v1 and k2 ! v2 are associations in the sequence,
then if k1 < k2, the association k1 ! v1 appears earlier in the sequence than k2 ! v2.
Notice that unlike the keys, the values in the associations play no role in the ordering.
This definition says nothing about the relative ordering of associations with the same key,
9
so they may appear in any order. As an example, here are some ordered sequences of the
associations from the example above, along with one that is not ordered:

Ordered:

Andrea " 6
Bill " 8
Bill " 2
Bill " 2
Bill " 3
Carey " 5
James " 4
Joe " 1
Larry " 7
Larry " 9
Ordered:

Andrea " 6
Bill " 2
Bill " 3
Bill " 2
Bill " 8
Carey " 5
James " 4
Joe " 1
Larry " 9
Larry " 7
Ordered:

Andrea " 6
Bill " 2
Bill " 2
Bill " 3
Bill " 8
Carey " 5
James " 4
Joe " 1
Larry " 7
Larry " 9
Not ordered:

Bill " 2
Andrea " 6
Bill " 3
Bill " 8
Bill " 2
Carey " 5
Joe " 1
James " 4
Larry " 7
Larry " 9

The last sequence above is not ordered because "Andrea" must precede "Bill", and
"James" must precede "Joe".

An iterator for a multimap is either invalid, or is valid and indicates one of the
associations in the multimap. Given a valid iterator, you can retrieve the key and the
value of that association. You can tell the iterator to advance to the next association or
back up to the previous association in that multimap, where next and previous are in
terms of an ordered sequence of the multimap's associations.

Here is an example of using the MultiMap and MultiMap::Iterator types you will
implement. This code creates a MultiMap, inserts some associations into it, and writes all
the associations whose key is greater than or equal to "Bill", in order of the keys:

void foo()
{
MultiMap myMultiMap;
myMultiMap.insert("Andrea", 6);
myMultiMap.insert("Bill", 2);
myMultiMap.insert("Carey", 5);
myMultiMap.insert("Bill", 8);
myMultiMap.insert("Batia", 4);
myMultiMap.insert("Larry", 7);
myMultiMap.insert("Larry", 9);
myMultiMap.insert("Bill", 3);

// Start at the earliest-occurring association with key "Bill"
MultiMap::Iterator it = myMultiMap.findEqual("Bill");
while (it.valid())
{
cout << it.getKey() << " " << it.getValue() << endl;
it.next(); // advance to the next associaton
}
}

10
One possible output produced by this function is

Bill 3
Bill 8
Bill 2
Carey 5
Larry 9
Larry 7

The output must contain the three "Bill" lines, followed by the "Carey" line, followed by
the two "Larry" lines. The order of the "Bill" lines among themselves is allowed to be
different, and the "Larry" lines may be in the other order. The specification below of
MultiMap's findEqual member function clarifies why all three "Bill" lines must appear.

!"# %&'()%*+ ,'*--

Your MultiMap class must have the following public interface. You must NOT change
or add to the public interface, with two exceptions: (1) if you wish, you may add
MultiMap::Iterator constructor(s) with whatever parameters you like, and (2) if the
compiler-generated destructor, copy constructor, and assignment operator for
MultiMap::Iterator don't behave correctly, you may declare and implement them.

!"#$$ ?0"4+?#/
.
/01"+!2
@@ A'0 :0$4 +:/"):)*4 4B+$ /01"+! *)$4)5 ?0"4+?#/22C4)-#4'- !"#$$
!"#$$ C4)-#4'-
.
/01"+!2
C4)-#4'-3;< @@ A'0 :0$4 B#D) # 5)E#0"4 !'*$4-0!4'-
C4)-#4'-3@F G'0 :#G B#D) #*G /#-#:)4)-$ G'0 "+() B)-) F@;<
1''" D#"+53; !'*$4<
$4522$4-+*6 6)4H)G3; !'*$4<
0*$+6*)5 +*4 6)4I#"0)3; !'*$4<
1''" *)843;<
1''" /-)D3;<
><

?0"4+?#/3;<
J?0"4+?#/3;<
D'+5 !")#-3;<
D'+5 +*$)-43$4522$4-+*6 ()G9 0*$+6*)5 +*4 D#"0);<
C4)-#4'- E+*5KL0#"3$4522$4-+*6 ()G; !'*$4<
C4)-#4'- E+*5KL0#"M-N0!!)$$'-3$4522$4-+*6 ()G; !'*$4<
C4)-#4'- E+*5KL0#"M-O-)5)!)$$'-3$4522$4-+*6 ()G; !'*$4<

/-+D#4)2
@@ &' /-)D)*4 ?0"4+?#/$ E-': 1)+*6 !'/+)5 '- #$$+6*)59 5)!"#-) 4B)$) :):1)-$
@@ /-+D#4) #*5 5' *'4 +:/"):)*4 4B):P
?0"4+?#/3!'*$4 ?0"4+?#/7 '4B)-;<
?0"4+?#/7 '/)-#4'-Q3!'*$4 ?0"4+?#/7 -B$;<
><

11
Here are the general requirements for your MultiMap class:

1. Your MultiMap class must use the public interface documented above. You may
add only private members to the MultiMap class; you must not add other public
members to MultiMap. Doing so will result in a score of ZERO for this part of
the project.
2. The keys of the associations are case-sensitive, so D'Oyly Carte and d'oyly
carte are different keys. This makes your implementation task easier.
3. Your MultiMap class does not need to implement any member function for
removing an individual association. (You will, though, have to remove all
associations in clear() and the destructor.)
4. You must not use any STL containers to implement MultiMap (e.g., no map, set,
multimap, unordered_map, vector, list, etc.).
5. As detailed later, you must implement MultiMap using a binary search tree that
you build yourself (defining your own private node type, maintain a root pointer,
etc.). The tree does not need to use a balancing algorithm (unless youre
masochistic and want to implement one).

For the descriptions below, we define a valid Iterator as one for which calling valid() on
it returns true; an invalid Iterator is one for which it returns false.

Requirements for "#$%&"'()*

The default constructor must create a MultiMap containing no associations. This
constructor must run in O(1) time.

Requirements for +"#$%&"'()*

The destructor must release all resources held by the MultiMap. For a MultiMap
containing N associations, the destructor must run in O(N) time.

Requirements for ,-&. /$0'1)*

The clear method must remove all associations from the MultiMap, resulting in it
containing no associations. (You might later insert some new associations.) For a
MultiMap containing N associations, clear must run in O(N) time.

Requirements for ,-&. &2301%)3%.443%1&25 6078 #23&520. &2% ,'$#0*

The insert method must add to the MultiMap a new association with the indicated key
and value. Because this is a multimap, it is allowable for this operation to result in
more than one association having the same key, or even the same key and value. For
a MultiMap containing N associations, insert must run in average case O(log N) time,
worst case O(N) time. (Because you are not required to keep the binary search tree
you use to implement MultiMap balanced, some insertion orders may result in a
terribly unbalanced tree.)

12
Requirements for 9%01'%-1 :&2.;<#'$)3%.443%1&25 607* /-23%

If no association in the MultiMap has a key equal to the key parameter, the findEqual
method must return an invalid Iterator; otherwise, it must return a valid Iterator.

If at least one association in the MultiMap has a key equal to the key parameter, the
findEqual method must return a valid Iterator indicating the earliest such association.
By "earliest", we mean that in an ordered sequence of all the associations in the
MultiMap, the association indicated by the returned Iterator comes before any others
with the same key or a greater key. This "earliest" requirement is why the example
code on page 9 prints all three "Bill" lines, not two or one of them.

For a MultiMap containing N associations, findEqual must run in average case O(log
N) time, worst case O(N) time. (Because you are not required to keep the binary
search tree you use to implement MultiMap balanced, a terribly unbalanced tree may
result in the worst case search time.)

Requirements for 9%01'%-1 :&2.;<#'$=1>#//033-1)3%.443%1&25 607* /-23%

If no association in the MultiMap has a key greater than or equal to the key
parameter, the findEqualOrSuccessor method must return an invalid Iterator;
otherwise, it must return a valid Iterator.

If at least one association in the MultiMap has a key greater than or equal to the key
parameter, the findEqualOrSuccessor method must return a valid Iterator indicating
the earliest such association. By "earliest", we mean that in an ordered sequence of
all the associations in the MultiMap, the association indicated by the returned Iterator
comes before any others with the same key or a greater key.

For a MultiMap containing N associations, findEqualOrSuccessor must run in
average case O(log N) time, worst case O(N) time. (Because you are not required to
keep the binary search tree you use to implement MultiMap balanced, a terribly
unbalanced tree may result in the worst case search time.)

Requirements for 9%01'%-1 :&2.;<#'$=1?10.0/033-1)3%.443%1&25 607* /-23%

If no association in the MultiMap has a key less than or equal to the key parameter,
the findEqualOrPredecessor method must return an invalid Iterator; otherwise, it
must return a valid Iterator.

If at least one association in the MultiMap has a key less than or equal to the key
parameter, the findEqualOrPredecessor method must return a valid Iterator
indicating the latest such association. By "latest", we mean that in an ordered
sequence of all the associations in the MultiMap, the association indicated by the
returned Iterator comes after any others with the same key or a lesser key.

13
For a MultiMap containing N associations, findEqualOrPredecessor must run in
average case O(log N) time, worst case O(N) time. (Because you are not required to
keep the binary search tree you use to implement MultiMap balanced, a terribly
unbalanced tree may result in the worst case search time.)

!"# %&'()%*+../(#0*(10 ,'*--

Here are the general requirements for your nested MultiMap::Iterator class:

1. Your Iterator class must use the public interface documented above. You may
add only private members to the Iterator class; you must not add other public
members to Iterator, with two exceptions: (1) if you wish, you may add Iterator
constructor(s) with whatever parameters you like, and (2) if the compiler-
generated destructor, copy constructor, and assignment operator for Iterator don't
behave correctly, you may declare and implement them.. Adding any other public
members will result in a score of ZERO for this part of the project.
2. If an Iterator is created as a result of an operation on a MultiMap, then after
insert(), clear(), or the destructor is called on that MultiMap, the behavior of
further operations on that Iterator, except assigning to it or destroying it, is not
defined by this spec. Roughly speaking, if a MultiMap's contents change, you
can't assume any Iterators currently being used with it are still reliable to use.
Notice that since this spec leaves the behavior undefined in this case, your
implementation may do whatever it likes in this case, even crashing. Typically,
you don't write any special code to detect such a situation (which is often
impossible or expensive to do), so you just allow your normal code to do what it
does, letting the chips fall where they may.

For the descriptions below, we talk about an Iterator being in a valid or an invalid state.
An iterator is in a valid state if it indicates an association in a Multimap; otherwise, it is in
an invalid state. (As an example, an Iterator in a valid state that indicates the last
association in an ordered sequence of a MultiMap's associations goes into an invalid state
when you call next() on it, since there is no association after the last one.)

Requirements for ?0"4+?#/22C4)-#4'-22C4)-#4'-3;

The default constructor must create an Iterator in an invalid state. This constructor
must run in O(1) time.

Requirements for other ?0"4+?#/22C4)-#4'- constructors

You may write other Iterator constructors with whatever parameters you like. It is
your choice whether the Iterator created by any such constructor is in a valid or
invalid state. Any such constructor must run in O(1) time.

14
Requirements for the ?0"4+?#/22C4)-#4'- destructor, copy constructor and
assignment operator

The Iterator class must have a public destructor, copy constructor and assignment
operator, either declared and implemented by you or left unmentioned so that the
compiler will generate them for you. If you design your class well, the compiler-
generated versions of these operations will do the right thing. Each of these
operations must run in O(1) time.

Requirements for 1''" ?0"4+?#/22C4)-#4'-22D#"+53; !'*$4

The valid method must return true if the Iterator is in a valid state, and false
otherwise. The valid method must run in O(1) time.

Requirements for $4522$4-+*6 ?0"4+?#/22C4)-#4'-226)4H)G3; !'*$4

If the Iterator is in a valid state, the getKey method must return the key from the
association indicated by the iterator. This method must run in O(1) time. Notice that
this spec does not define the behavior of getKey if the Iterator is in an invalid state, so
your implementation may do whatever it likes in that case, even crashing.

Requirements for 0*$+6*)5 +*4 ?0"4+?#/22C4)-#4'-226)4I#"0)3; !'*$4

If the Iterator is in a valid state, the getValue method must return the value from the
association indicated by the iterator. This method must run in O(1) time. Notice that
this spec does not define the behavior of getValue if the Iterator is in an invalid state,
so your implementation may do whatever it likes in that case, even crashing.

Requirements for 1''" ?0"4+?#/22*)843;

If the Iterator is in an invalid state, the next method does nothing and returns false.
Otherwise, the Iterator is in a valid state, so it indicates an association in a MultiMap.
Consider an ordered sequence of the associations contained by that MultiMap. If the
association indicated by the Iterator is the last one in that sequence, then the next
method puts the Iterator into an invalid state and returns false; otherwise, next makes
the Iterator indicate the association in the sequence that comes immediately after the
one it currently indicates, and returns true.

For a MultiMap containing N associations, next must run in average case O(log N)
time, worst case O(N) time. (Because you are not required to keep the binary search
tree you use to implement MultiMap balanced, a terribly unbalanced tree may result
in the worst case time. On the other hand, whether a tree is balanced or not, it can be
proved that for most of the nodes, next can run in O(1) time!)

Requirements for 1''" ?0"4+?#/22/-)D3;

If the Iterator is in an invalid state, the prev method does nothing and returns false.
15
Otherwise, the Iterator is in a valid state, so it indicates an association in a MultiMap.
Consider an ordered sequence of the associations contained by that MultiMap. If the
association indicated by the Iterator is the first one in that sequence, then the prev
method puts the Iterator into an invalid state and returns false; otherwise, prev makes
the Iterator indicate the association in the sequence that comes immediately before
the one it currently indicates, and returns true.

For a MultiMap containing N associations, prev must run in average case O(log N)
time, worst case O(N) time. (Because you are not required to keep the binary search
tree you use to implement MultiMap balanced, a terribly unbalanced tree may result
in the worst case time. On the other hand, whether a tree is balanced or not, it can be
proved that for most of the nodes, prev can run in O(1) time!)

There is a further behavioral and performance requirement. Suppose that +4 is a valid
Iterator indicating an association in a MultiMap containing N associations. Consider the
following code:

assert(it.valid());
MultiMap::iterator p;
for ( ; it.valid(); it.prev())
p = it;
for ( ; p.valid(); p.next())
cout << p.getKey() << " " << p.getValue() << endl;

This code must write the key and value of every association in the MultiMap, in the same
order as an ordered sequence of those associations. This code must run in O(N) time.
Notice that while a particular call to the prev or next method is allowed to run in O(log N)
or even O(N) time, enough other calls must run in O(1) time so that visiting every
association from first to last runs in O(N) time. If your code for next and prev behaves
correctly, this performance result will probably happen naturally without your doing
anything special.

%&'()%*+ /2+'#2#3(*()13 41(#-

If it weren't for one of the performance requirements, you could implement a MultiMap
using a dynamically allocated array (or a vector if we allowed you to use any STL
container) in which you kept all the associations as an ordered sequence. Your Iterator
would contain simply a pointer into that array, its next method would use the ++ operator
on that pointer, etc. MultiMap's findEqual method would do a binary search through the
array. But alas, the insert method's requirement of average case O(log N) time prevents
this; inserting an association while preserving the sorted order require moving O(N)
associations to make room.

The simplest implementation that meets the performance requirements uses a binary
search tree, and this spec indeed requires your MultiMap class to be implemented using
a binary search tree. Each node in the tree should contain a key, and the tree should be
16
organized based on comparing keys. If the class we're implementing were a Map, not a
MultiMap, so that each key is unique, then it would be obvious to have each node also
contain the (only) value corresponding to the key.

But you are to implement a MultiMap, which allows multiple associations that have the
same key. There are several ways to implement a binary search tree that allows duplicate
keys. One way is to have each node contain the key and value of a single association; if
more than one association has the same key, the tree will contain more than one node
with the same key. Let's call this the single-value-per-node approach, and see what
happens when we execute this code:

MultiMap mm;
mm.insert("joe", 5);
mm.insert("joe", 2);
mm.insert("bill", 1);
mm.insert("zoey", 4);
mm.insert("joe", 7);

The empty MultiMap would be represented by the empty tree, and inserting the joe"5
association would result in a tree that has the key "joe" and the value 5 in its one node.
Let's assume we're not going to implement any tree-balancing algorithm (since the spec
doesn't require us to, and we don't want to spend forever on this project!). Then this first-
inserted node will always be the root of the tree.

Now what happens when we insert the joe"2 association? By the definition of a binary
search tree, the node containing that association can be inserted in either the left or the
right subtree of a node with the same key. A simple approach would be to always insert
the node in, say, the left subtree of a node with an equal key. If we take this approach,
the tree that results from the five insert operations above is



An Iterator would presumably contain a pointer to the node representing the association
that the Iterator indicates. The natural implementation of Iterator's next method would
make the Iterator's pointer point to the next node in an inorder traversal of the tree, and
MultiMap's findEqual method would return an Iterator pointing to the node with the
desired key that comes earliest in an inorder traversal of the tree. If we're not balancing
the tree, that would be the deepest node with that key in the tree.

17
Another way to implement a Multimap using a binary tree uses what we'll call the
multiple-value-per-node approach. Each node in the tree contains a unique key and a
pointer to a linked list of list nodes containing the value parts of all the associations with
that key. Using this approach, the tree resulting from this code

MultiMap mm;
mm.insert("joe", 7);
mm.insert("joe", 2);
mm.insert("bill", 1);
mm.insert("zoey", 2);
mm.insert("joe", 5);

might look like this:



With this approach, the representation of an Iterator is a little more complicated than a
simple pointer to a tree node, because we have to be able to retrieve through the Iterator
both the key and the value of the association it indicates. If we repeatedly call the next
method on an Iterator, we have to be able to visit all the values associated with a given
key, and then proceed to visit the values associated with the next key in an inorder
traversal of the tree.

You may use any binary search tree-based data structure you like to implement your tree.

If you sketch out the algorithms for both of these methods, you'll find that the multiple-
value-per-node approach is simpler to implement and can be considerably faster than the
single-value-per-node approach. You may choose either approach, though, or another of
your own design.

Whichever approach you take, your Iterator's next method needs to know how to advance
a pointer to a tree node to point to the tree node that would be next in an inorder traversal
of the tree. The performance requirements imply that you must not implement next so
that each time it's called, it starts going through a full inorder traversal of the tree until it
finds the proper node. Here are some hints about one way of implementing it instead:

1. Consider having your tree nodes contain a parent pointer.
2. Recognize the two cases you need to deal with:
a. The current tree node has a right child; in this case, the next node in an
inorder traversal is somewhere in the current node's right subtree.
b. The current tree node does not have a right child; in this case, the next node
in an inorder travesal, if there is one, is an ancestor of the current node.

18
Reasoning about the prev method is symmetrical. Well let you figure the rest out on
your own. Try drawing some sample trees and see if you can figure out the pattern for
locating the next node in the tree for an inorder traversal. Dont make your trees too
simple or you may fool yourself into thinking the problem is easier than it is.

There are other ways of enabling iteration with next and prev. As long as the
implementation you come up with satifies the spec, you're free to use it.

Database

The Database class is responsible for implementing a simple database, and must leverage
your MultiMap and MultiMap::Iterator classes to do so. Here's what you can do with a
Database:

1. You can create a new database.
2. You can specify a schema for a database. This specifies what fields (e.g., first
name, last name, phone number, GPA) are present in each record. You may also
specify which fields in the schema should be indexed, and therefore may be
searched by the user in a query. For example:

FirstName, indexed
LastName, indexed
PhoneNumber, not indexed
Occupation, not indexed
Age, indexed

Note: Specifying a new schema for a database will remove all currentlyly existing
records and indexes from the database.
3. You can add one or more records to the database (e.g. a record with fields joe,
smith, 818-555-1212, Engineer, 024).
4. You can import a web page containing a bunch of records from a specified URL
into the database, e.g., from http://some.website.com/data.txt. The web page must
have the data stored in a comma-separated format. Here we show data for our
schema of first name, last name, phone number, occupation, age:

Joe,Smith,818-555-1212,forklift operator,019
Bill,Nachenberg,310-456-7890,college professor,056
Sally,Smallberg,800-123-4567,organ harvester,025
Yen,Chen,310-877-3353,advertising intern,028
Barry,Smith,442-324-2342,unemployed,060
Sally,Feng,543-234-2342,accountant,024
Daniel,Chen,310-345-3234,power-programmer,029
Dan,Nieh,510-656-4643,philosopher,023

19
5. You can select records in the database based on various criteria and have the
results ordered order in a number of different ways (or not at all). For example,
Find all people whose last name is between Nac and Smart, inclusive, who are
between 20 and 23 years old. Order the resulting records by last name in
ascending order, and if there are two or more records with the same last name,
order those records by the first name in ascending order. or Find all people
whose last name is greater than or equal to Feng and whose first name is between
Carl and Eunice. Order the resulting records by first name in ascending order,
and if there are two or more records with the same first name, order those records
by the age in descending order.
6. You can remove all records from the database and remove the current schema.
After doing this, you may specify a new schema and add one or more new records
that adhere to that schema.

!"# 5*(* 6(0&7(&0#- 8-#9 )3 * 5*(*:*-#

Ultimately, a simple database has three primary data structures:

1. A schema description
2. A bunch of rows of data (also known as data records)
3. One or more field indexes



!"# %&"#'( )#*&+,-.,/0

The schema (shown in the diagram above as m_schema) describes what each record in
the database must look like. In the diagram above, we can see that the schema indicates
20
that each record has three fields: a user name, a phone number, and the persons age. The
user can specify any number of fields they like in a given schema.

Notice that in addition to specifying the name of each field, the schema also indicates
whether or not each field should be indexed or not. What does it mean that a field should
be indexed? If a field is indexed, this means that the database must use a data structure
(such as a multimap) that lets the user efficiently search through the values associated
with that field in the database. For example, since the phonenum field is designated as
it_indexed in the example above, the user must be able to efficiently (e.g., in log N
time) search through the phone number values (e.g., 818-555-1212, 310-234-2342, 310-
234-2342, 310-234-2342, 424-676-0202, etc.) of the records in the database to find all
rows in the database with a specific phone number, or with a specified range of phone
numbers.

!"# 120&" /3 4/5* /3 )(.(

Each database holds zero or more rows of data (shown in the diagram above as m_rows).
A single row of data holds a collection of values (e.g., a username, phone number and
age) that matches the schema. (In our simple database, all values are C++ strings; in a
real database, the values could be integers, doubles, strings, timestamps, etc.) As you can
see in the diagram above, our database has five rows (numbered 0 through 4). In a real
database, there might be billions of rows and theyd be stored on dozens of hard drives.
For your database for this project, these rows may be stored in a STL vector.

60# /+ '/+# 3,#78 ,08#9#*

Every field in your schema that has been designated as an indexed field must have a
dedicated index inside your database. For the purposes of this project, an index is
basically a binary search tree-based MultiMap that maps each field value (e.g., 310-234-
2342) to the row or rows {2,3,4} where that field value may be found. Your database
must have at least one index, and might have many indexes (shown as the m_fieldIndex
vector/array in the diagram above), depending on how many fields the schema specifies
must be indexed.

For example, since the username field (the first or 0
th
field in our schema) was designated
as indexed in our schema, m_fieldIndex[0] contains a mapping between every
username value (e.g., climberkip, davidsmall, ednatodd, missessmall, smallkid) and a row
number in the database of a record whose field equals that value. For example,
m_fieldIndex[0] contains a mapping of ednatodd"4, because a record with a username
field of ednatodd may be found in row 4 of m_rows.

Similarly, in the example above, our phonenum field (the second field in the schema) was
also designated as indexed. As such, notice that m_fieldIndex[1] contains a mapping
between every phone number value (e.g., 818-555-1212, 310-234-2342, 310-234-2342,
21
310-234-2342, 424-676-0202) and a row number where that particular value may be
found in m_rows. Notice that a given value like 310-234-2342 may be found in multiple
rows in your database (that makes sense multiple people could have the same phone
number), so your index needs to allow for this. This is why we use a multimap and not a
regular map to implement each index.

Why have an index? Well, say you want to quickly find all of the people who have a
particular phone number, or find all people between 20 and 22 years old? If you have N
records and no index data structure, youd have to use an O(N) linear search algorithm,
going through every row of data looking for fields that matched what you were looking
for. But with an index, you can quickly (in O(log N) time) locate all matching rows,
speeding up your search dramatically this is exactly what real databases do!

Ok, so now we know what data structures make up a Database. Lets discuss its public
interface.

Your Database class must have the following public interface. You must NOT change
or add to the public interface, except that if the compiler-generated default constructor
and/or destructor behave correctly, you do not have to declare or implement them.

!"#$$ R#4#1#$)
.
/01"+!2
)*0: C*5)8&G/) . +4S*'*)9 +4S+*5)8)5 ><
)*0: M-5)-+*6&G/) . '4S#$!)*5+*69 '4S5)$!)*5+*6 ><

$4-0!4 T+)"5R)$!-+/4'-
.
$4522$4-+*6 *#:)<
C*5)8&G/) +*5)8<
><

$4-0!4 N)#-!BU-+4)-+'*
.
$4522$4-+*6 E+)"5=#:)<
$4522$4-+*6 :+*I#"0)<
$4522$4-+*6 :#8I#"0)<
><

$4-0!4 N'-4U-+4)-+'*
.
$4522$4-+*6 E+)"5=#:)<
M-5)-+*6&G/) '-5)-+*6<
><

$4#4+! !'*$4 +*4 KVVMVSVKNWX& Q YZ<

R#4#1#$)3;<
JR#4#1#$)3;<
1''" $/)!+EGN!B):#3!'*$4 $4522D)!4'-[T+)"5R)$!-+/4'-\7 $!B):#;<
1''" #55V']3!'*$4 $4522D)!4'-[$4522$4-+*6\7 -']MER#4#;<
1''" "'#5T-':WVX3$4522$4-+*6 0-";<
1''" "'#5T-':T+")3$4522$4-+*6 E+")*#:);<
22
+*4 6)4=0:V']$3; !'*$4<
1''" 6)4V']3+*4 -']=0:9 $4522D)!4'-[$4522$4-+*6\7 -']; !'*$4<
+*4 $)#-!B3!'*$4 $4522D)!4'-[N)#-!BU-+4)-+'*\7 $)#-!BU-+4)-+#9
!'*$4 $4522D)!4'-[N'-4U-+4)-+'*\7 $'-4U-+4)-+#9
$4522D)!4'-[+*4\7 -)$0"4$;<

/-+D#4)2
@@ &' /-)D)*4 R#4#1#$)$ E-': 1)+*6 !'/+)5 '- #$$+6*)59 5)!"#-) 4B)$) :):1)-$
@@ /-+D#4) #*5 5' *'4 +:/"):)*4 4B):P
R#4#1#$)3!'*$4 R#4#1#$)7 '4B)-;<
R#4#1#$)7 '/)-#4'-Q3!'*$4 R#4#1#$)7 -B$;<
><

Here are the general requirements for your Database class:

1. Your Database class must use the public interface documented above. You may
add only private members to the Database class; you must not add other public
members to Database. Doing so will result in a score of ZERO for this part of the
project.
2. Strings in the database are case-sensitive, so D'Oyly Carte and d'oyly carte
are different strings. This makes your implementation task easier.
3. You must not use the STL map, multimap, or unordered_map containers, or the
nonstandard hash_map or hash_multimap containers, to implement Database.
You may use any other STL containers (e.g., vector, list, set, etc.).
4. For the purpose of indexing data that a schema requires to be indexed, your
Database implementation must use a MultiMap.
5. Your Database class must, at a minimum, contain the following data structures:
a. m_rows: A vector of data records (e.g., a vector of vector of strings)
b. m_fieldIndex: A vector of MultiMaps or pointers to MultiMaps

Requirements for @'%'A'30)*

The default constructor must create a Database containing no rows and no field
descriptions in its schema. This constructor must run in O(1) time.

Requirements for +@'%'A'30)*

The destructor must release all resources held by the Database. For a Database
containing F fields in its schema and N rows, the destructor must run in O(FN) time.

Requirements for A--$ 3(0/&:7>/B0C')/-23% 3%.44,0/%-1DE&0$.@03/1&(%-1FG 3/B0C'*

The specifySchema method is used to specify a new schema for the database. The
schema describes what fields will be in every data record and which fields must be
indexed by your database. Every time the user calls the specifySchema method, it
must first completely reset your database, discarding any existing field descriptions in
its schema, any existing rows, and any indexes. It then must install a new schema.

23
The details of the new schema are in the vector of Database::FieldDescriptors passed
to the specifySchema method. Each FieldDescriptor structure holds two values: the
name of a field (e.g., username or phonenum), and a value that specifies whether
this field should be indexed or not (Database::it_indexed or Database::it_none).

Heres how the specifySchema method might be called:

1''" $)4N!B):#3R#4#1#$)7 51;
.
R#4#1#$)22T+)"5R)$!-+/4'- E5Z9 E5^9 E5_<

E5ZP*#:) Q `0$)-*#:)`<
E5ZP+*5)8 Q R#4#1#$)22+4S+*5)8)5< @@ 0$)-*#:) +$ #* +*5)8)5 E+)"5

E5^P*#:) Q `/B'*)*0:`<
E5^P+*5)8 Q R#4#1#$)22+4S+*5)8)5< @@ /B'*) a +$ #* +*5)8)5 E+)"5

E5_P*#:) Q `#6)`<
E5_P+*5)8 Q R#4#1#$)22+4S*'*)< @@ #6) +$ =M& #* +*5)8)5 E+)"5

$4522D)!4'-[R#4#1#$)22T+)"5R)$!-+/4'-\ $!B):#<
$!B):#P/0$BS1#!(3E5Z;<
$!B):#P/0$BS1#!(3E5^;<
$!B):#P/0$BS1#!(3E5_;<

-)40-* 51P$/)!+EGN!B):#3$!B):#;<
>

The specifySchema method must ensure that the Database object maintains a separate
index (implemented using a MultiMap) for every indexed field in the schema. In the
above example, two of the fields, username and phonenum, were designated as
indexed, so specifySchema must ensure that m_fieldIndex has two initialized
MultiMaps to index all values stored in these two fields across all the rows.

The specifySchema method must return false if there is not at least one indexed field
in the schema; if it returns false, it must leave the schema with no field descriptions in
it. If there is at least one indexed field, specifySchema returns true.

Your specifySchema method does not have to worry about being passed an invalid
schema (for example, one with an empty string as a field name, or one with two fields
with the same name). Given the complexity of this project, youve got better things
to worry about. We will not try to trick your code in this manner. If you do want to
check for situations like these, you may; if you detect them, have specifySchema
return false, leaving the schema with no field descriptions in it.

Requirements for A--$ '..H-I)/-23% 3%.44,0/%-1D3%.443%1&25FG 1-I=:@'%'*

The addRow method is used to add a new data record (also known as a row) into your
database. A row of data is represented by a vector of string values that correspond to
the current schema. (You may not add a row until youve specified a schema.) The
row must have the same number of fields as your current schema, and the j
th
item in
24
the vector corresponds to the j
th
field in the schema. The addRow method must return
true if it is successful, and false otherwise.

Given the schema shown in the previous sections example (username, phonenum,
age), here is how the addRow method might be called to add a new row of data values
that matches this schema to a database:

D'+5 #55bV']3R#4#1#$)7 51; @@ #$$0:)$ $!B):# B#$ #"-)#5G 1))* $/)!+E+)5
.
$4522D)!4'-[$4522$4-+*6\ -']<

-']P/0$BS1#!(3`)5*#4'55`;< @@ E+)"5 c2 0$)-*#:)
-']P/0$BS1#!(3`d^dYefeYc^c^`;< @@ E+)"5 Z2 /B'*) *0:1)-
-']P/0$BS1#!(3`cc_g`;< @@ E+)"5 ^2 #6)

51P#55V']3-'];< @@ #55 4B) *)] -'] 4' 4B) R#4#1#$)
>

An added row of data must have the same number of fields as the schema, and the j
th

value of the added row corresponds to the j
th
field of the schema. So ednatodd, the
first field in the row vector above, is a username value, since that was the first field
name specified in our schema; 818-555-1212, the second value in the row vector, is
a phonenum value, since that was the second field name specified in our schema
above; and so on. If the row the user passes in contains a number of values that is not
the number of fields in the Databases current schema (the one installed by the most
recent successful call to the specifySchema method), then the new row will not be
added, and the addRow method must return false without changing the database.

Otherwise, the addRow method must perform the following actions when provided
with a valid new row of data:
1. It must add the new row to the end of the m_rows vector that the Database
object maintains. If the m_rows vector already holds N rows in positions 0
through N-1, then a new row added must be placed in position N.
2. If the new row is being added to position N of m_rows, then for each value at
position j in the row of data that is being added, if field j was designated as an
indexed field in the schema, then addRow must insert an entry into
m_fieldIndex[j] that associates rowOfData[j] with N. (In the example above,
fieldIndex[0] it would associate ednatodd with N, since field 0 of the
schema is indexed.)
3. The method returns true.

So, suppose the database looked like this at some time:

25


After calling the addARow function above, the database would look like this:



As you can see, ednatodds record was added to the end of the m_rows vector into
position 4. In addition, since the schema specified that the username and phonenum
fields must be indexed, an association has been added to m_fieldIndex[0] mapping
ednatodd " 4, the position of the new record in m_rows. Also, a new association has
been added to m_fieldIndex[1] mapping 424-676-0202 " 4.

When the addRow() method has completed, our new row will have been added and
all indexes updated. The method will then return true.

Requirements for A--$ $-'.E1-CJHK)3%.443%1&25 #1$*
Requirements for A--$ $-'.E1-CE&$0)3%.443%1&25 :&$02'C0*

The loadFromURL method loads a schema and potentially many data records from a
web page. (See the HTTP class section on page 7 to learn how to connect to the
Internet.)

The loadFromFile method loads a schema and potentially many data records from a
data file.

26
Both functions work in the same way, exception that they get their input from
different sources.

If N is the number of data records loaded from the input and the schema has F
indexed fields, then these methods must run in average case O(FN log N) time.

Heres how the methods might be used:

D'+5 #55T-':C*4)-*)43R#4#1#$)7 51;
.
1''" '( Q 51P"'#5T-':WVX3`B44/2@@]]]P$':)$+4)P!':@/#4+)*4Y5#4#@h#*0#-G`;<
+E 3'(;
!'04 [[ `N!B):# #*5 5#4# "'#5)5 E-': 4B) C*4)-*)4 $0!!)$$E0""Gi` [[ )*5"<
)"$)
!'04 [[ `K--'- "'#5+*6 $!B):# #*5 5#4# E-': 4B) C*4)-*)4` [[ )*5"<
>

D'+5 #55T-':T+")3R#4#1#$) 751;
.
1''" '( Q 51P"'#5T-':T+")3`U2@:GE+")$@5#4#P484`;<

+E 3'(;
!'04 [[ `R#4# "'#5)5 E-': E+") 5#4#P484 $0!!)$$E0""Gij*`<
)"$)
!'04 [[ `K--'- "'#5+*6 5#4# E-': E+") 5#4#P484j*`<
>


The methods may assume that their input has the following format:

T+)"5=#:)ZkFl9T+)"5=#:)^kFl9T+)"5=#:)_kFl9T+)"5=#:)dkFl9m9E+)"5=#:)=kFl
T+)"5Z9E+)"5^9E+)"5_9E+)"5d9m9E+)"5=
T+)"5Z9E+)"5^9E+)"5_9E+)"5d9m9E+)"5=
m
T+)"5Z9E+)"5^9E+)"5_9E+)"5d9m9E+)"5=

The first line of the page/file must be a schema line, and must list all of the field
names (e.g., firstName or age), separated by commas, with no extraneous
whitespace (no spaces, tabs, etc.). Each field name may optionally have a *
immediately following it, indicating that this field is an indexed field. Fields that lack
a * immediately after their name are non-indexed fields. (The [*] above indicates an
optional * value.)

The remaining lines list the actual rows of data with the fields separated by commas.
(This means no field may contain a comma.)

Heres an example:

First*,Last*,Phone,Occupation*,Age
Joe,Smith,818-555-1212,forklift operator,019
Bill,Nachenberg,310-456-7890,college professor,056
Sally,Smallberg,800-123-4567,organ harvester,025
27
Yen,Chen,310-877-3353,advertising intern,028
Barry,Smith,442-324-2342,unemployed,060
Sally,Feng,543-234-2342,accountant,024
Daniel,Chen,310-345-3234,power-programmer,029
Dan,Nieh,510-656-4643,philosopher,023

The data being imported must have the same number of fields on each line (e.g., 5 in
the example above) as the schema specified on the top line.

If you cannot successfully import the data from the specified URL or file for any
reason (the page indicated by the URL can't be fetched, the indicated file can't be
opened, the data in the web page or file is not valid, the Internet is not available, the
schema is invalid, a data row has the wrong number of fields, etc.), then these
methods must return false. Otherwise these methods must import all rows (adding
each to the end of the m_rows vector and indexing all relevant fields in m_fieldIndex),
and return true.

If these methods return false, then either the database must be put in a state in which
the schema has no field descriptions in it and the database has no rows, or a state in
which the schema is valid and the database correctly holds zero or more valid rows
from the input (but not all the rows, otherwise you should return true); it's your choice
which of these states you leave the database in.

Requirements for &2% 50%L#CH-I3)* /-23%

The getNumRows method returns the number of rows currently in the database. This
method must run in O(1) time.

Requirements for A--$ 50%H-I)&2% 1-IL#C8 3%.44,0/%-1D3%.443%1&25FG 1-I* /-23%

The getRow method puts data from the row at position rowNum in the database into
the provided row vector parameter. Any data row contained any data prior to the call
to getRow must be replaced by the desired row of data from m_rows. This method
must run in O(F) time, where F is the number of fields in each row.

If rowNum is invalid, then getRow must return false and not change the row
parameter. Otherwise, the method returns true.

D'+5 6)4b*5O-+*4V']=0:3R#4#1#$)7 519 0*$+6*)5 +*4 -']=0:;
.
$4522D)!4'-[$4522$4-+*6\ 4#-6)4V']<
1''" '( Q 51P6)4V']3-']=0:9 4#-6)4V'];<
+E 3'(;
.
@@ /-+*4 )#!B E+)"5 D#"0) E'""'])5 1G # $/#!)
E'- 3$+,)S4 + Q c< + [ 4#-6)4V']P$+,)3;< +nn;
!'04 [[ 4#-6)4V']k+l [[ ` `<
!'04 [[ )*5"<
>
)"$)
28
!'04 [[ `K--'- 6)44+*6 -'] *0:1)-2 ` [[ -']=0: [[ )*5"<
>

Requirements for &2% 30'1/B)/-23% 3%.44,0/%-1D>0'1/BM1&%01&-2FG 30'1/BM1&%01&'8
/-23% 3%.44,0/%-1D>-1%M1&%01&-2FG 3-1%M1&%01&'8
3%.44,0/%-1D&2%FG 103#$%3*

All right, heres where things get interesting. The search method is used to search for
all rows that satisfy specific criteria (passed in via the searchCriteria parameter). All
matching rows, if any, then must be ordered based on any specified ordering criteria
(passed in via the sortCriteria parameter) and returned to the user as a vector of row
numbers (via the results parameter). The big-O requirements for this method are
stated below.

Search Criteria

The user may specify one or more search criteria (requirements). Each search
criterion must include:
1. The name of the field to be searched (which must exactly match an indexed
field name in the schema)
2. A minimum value for this field
3. A maximum value for this field

For each criterion, the user may omit one or the other of the minimum value or the
maximum value, but must not omit both. Omitting the minimum value indicates that
the criterion includes all rows with a field value that is less than or equal to the
specified maximum value. Omitting the maximum value indicates that the criterion
includes all rows with a field value that is greater than or equal to the specified
minimum value. Omitting the minimum value or the maximum value is indicated by
the corresponding member of the SearchCriterion structure being the empty string.

If the user provides multiple search criteria, then the search method must return all
and only those rows that satisfy all of the specified criteria.

Heres an example that searches our earlier database for all users with a username
between albert and molly, inclusive, and who have a phone number less than or
equal to 310-234-2342:

29


D'+5 5'bo0)-G3R#4#1#$)7 51;
.
$4522D)!4'-[N)#-!BU-+4)-+'*\ $)#-!BU-+4<

N)#-!BU-+4)-+'* $Z<
$ZPE+)"5=#:) Q `0$)-*#:)`<
$ZP:+*I#"0) Q `#"1)-4`<
$ZP:#8I#"0) Q `:'""G`<

N)#-!BU-+4)-+'* $^<
$^PE+)"5=#:) Q `/B'*)*0:`<
$^P:+*I#"0) Q ``< @@ *' :+*+:0: $/)!+E+)5
$^P:#8I#"0) Q `_ZcY^_dY^_d^`<

$)#-!BU-+4P/0$BS1#!(3$Z;<
$)#-!BU-+4P/0$BS1#!(3$^;<

@@ p)q"" ")#D) '0- $'-4 !-+4)-+# ):/4G E'- *']9 ]B+!B :)#*$
@@ 4B) -)$0"4$ :#G 1) -)40-*)5 4' 0$ +* #*G '-5)-

$4522D)!4'-[N'-4U-+4)-+'*\ $'-4U-+4<
$4522D)!4'-[+*4\ -)$0"4$<
+*4 *0:T'0*5 Q 51P$)#-!B3$)#-!BU-+49 $'-4U-+49 -)$0"4$;<
+E 3*0:T'0*5 QQ R#4#1#$)22KVVMVSVKNWX&;
!'04 [[ `K--'- L0)-G+*6 4B) 5#4#1#$)i` [[ )*5"<
)"$)
.
!'04 [[ *0:T'0*5 [[ ` -']$ :#4!B)5 4B) !-+4)-+#< B)-) 4B)G #-)2` [[ )*5"<
E'- 3$+,)S4 + Q c< + [ -)$0"4$P$+,)3;< +nn;
.
@@ /-+*4 4B) -'] *0:1)- '04 ]B)-) ]) B#5 # :#4!B
!'04 [[ `V'] a` [[ -)$0"4$k+l [[ `2 `<

@@ 6)4 #*5 /-+*4 4B) E+)"5 D#"0)$ '04 E-': 4B#4 -']
$4522D)!4'-[$4522$4-+*6\ -']R#4#<
+E 36)4V']3-)$0"4$k+l9 -']R#4#;;
.
E'- 3$+,)S4 h Q c< h [ -']R#4#P$+,)3;< hnn;
!'04 [[ -']R#4#khl [[ ` `<
!'04 [[ )*5"<
30
>
)"$)
!'04 [[ `K--'- -)4-+)D+*6 -']q$ 5#4#i` [[ )*5"<
>
>
>

The first search criterion above specifies that the username must be between albert
and molly, inclusive. This matches rows 0, 1, 2, and 4, since all of them have
usernames that are greater than or equal to albert and less than or equal to molly:
climberkip, davidsmall, missessmall, ednatodd.

The second search criterion above specifies that the phone number must be less than
or equal to 310-234-2342, with no minimum value. This matches rows 1, 2 and 3,
but not 4.

The result of our query is the intersection of the two sets of rows, since each set has
the result of matching just one of our criteria and the final result has to match all of
them: {0, 1, 2, 4} ! {1, 2, 3}, which is {1, 2}. Therefore, our results vector should
contain the two values 1 and 2, and our example function above should print either:

^ -']$ :#4!B)5 4B) !-+4)-+#< B)-) 4B)G #-)2
V'] aZ2 5#D+5$:#"" _ZcY^_dY^_d^ ccg^
V'] a^2 :+$$)$:#"" _ZcY^_dY^_d^ ccd^

or

^ -']$ :#4!B)5 4B) !-+4)-+#< B)-) 4B)G #-)2
V'] a^2 :+$$)$:#"" _ZcY^_dY^_d^ ccd^
V'] aZ2 5#D+5$:#"" _ZcY^_dY^_d^ ccg^

Since no ordering criteria were specified, the search method was not required to
deliver the results in any particular order.

Note: The example above only showed two search criteria, but the user may specify
as many search criteria as they like, within reason (e.g., up to 100), so long as they
specify at least one. If no search criteria are specified, then the search method must
return ERROR_RESULT. If any of the search criteria either (a) refers to a field name
that is not an indexed field in the current schema, or (b) lacks both minimum and
maximum values, then the method must return ERROR_RESULT. If the method
returns ERROR_RESULT, the results parameter must be set to empty.

Sorting Criteria

The user may optionally specify one or more sorting criteria for the results of a query.
The sorting criteria may be on any field or fields (including those that are not
designated as indexed fields in the databases schema), and may be ascending or
descending order for each field.

31
As an example, one possible sequence of sorting criteria might be:

username, ascending
age, descending
phonenum, ascending

What does it mean to have three (or more) different items in our ordering criteria? It
means that all three ordering rules must be applied to the data.

But how can we sort things by username and age and phonenum?

Well, all rows must first be ordered by their username field value (the top criterion) in
ascending order. That means that any row containing albert would come before a
row containing cindy, and that a row containing albert would come after a row
with a username of Albert (upper-case ASCII characters have lower values than
lower-case ASCII characters). If each row had a different value for its username
field, all rows would be ordered simply by their username field.

However, if two or more rows have a the same username, then how should those rows
be ordered relative to each other? Well, the next sorting criterion (age, descending, in
the example) must be applied to order those rows. In the example above, wed order
all rows with the same username based on each persons age, in descending order
(i.e., with older people above younger people)

And what if two or more rows have identical usernames and ages? Well, then the last
sorting criterion (phonenum, ascending) would need to be applied to these rows. In
the example above, these rows would be ordered based on their phone number, in
ascending order.

Heres an example to help you understand how to properly order your results.
Consider the following schema, used to represent students:

lastname, indexed
firstname, indexed
studentID, not indexed
GPA, not indexed

Suppose that searching through the database using some search criteria indicated that
the following data rows satisfied the search criteria:

Smith,James,100300001,3.50
Nachenberg,Carey,400217123,3.99
Smallberg,David,000000001,3.99
Wang,Billy,398764354,2.73
Feng,Cameron,424567897,3.24
Wang,Lily,240943234,3.87
32
Nachenberg,Simon,001423625,2.15
Wang,Jeff,592325224,3.76
Smith,Alice, 200300421,3.50
Wang,Eric,909222524,3.17
Smith,James,777493762,3.52

Now, further suppose that the user specified the following sorting criteria:

N'-4U-+4)-+# !Z<
!ZPE+)"5=#:) Q `"#$4*#:)`<
!ZP'-5)-+*6 Q R#4#1#$)22'4S#$!)*5+*6<

N'-4U-+4)-+# !^<
!^PE+)"5=#:) Q `E+-$4*#:)`<
!^P'-5)-+*6 Q R#4#1#$)22'4S#$!)*5+*6<

N'-4U-+4)-+# !_<
!_PE+)"5=#:) Q `rOb`<
!_P'-5)-+*6 Q R#4#1#$)22'4S5)$!)*5+*6<

Then the rows must be returned in the following order:

Feng,Cameron,424567897,3.24
Nachenberg,Carey,400217123, 3.99
Nachenberg,Simon,001423625,2.15
Smallberg,David,000000001,3.99
Smith,Alice, 200300421,3.50
Smith,James,777493762,3.52
Smith,James, 100300001, 3.50
Wang,Billy,398764354,2.73
Wang,Eric,909222524,3.17
Wang,Jeff,592325224,3.76
Wang,Lily,240943234,3.87

As you can see, all of the results were first ordered by last name in ascending order.
Second, where there was a tie with the last name (as with Nachenberg, Smith and
Wang), items were further ordered by their first name in ascending order. Finally,
where there was a tie in both last and first name, as with James Smith (a common
name) the items were further ordered in descending order by the students GPAs.

Now youre probably wondering: How do I sort a bunch of data items based on
multiple criteria. Heres a hint that you can adopt (with many changes) to solve this
problem.

The example below sorts a bunch of student records by lastname (ascending),
firstname (ascending) and GPA (descending). The sorting criteria are hard-coded into
the program, which isn't what you need for this project.


33
struct Student
{
std::string lastName;
std::string firstName;
std::string studentID;
std::string GPA;
};

bool doesABelongBeforeB(const Student& a, const Student& b)
{
// return true if a belongs before b, false otherwise
// assuming an ascending ordering by lastName
if (a.lastName < b.lastName)
return true;
if (a.lastName > b.lastName)
return false;

// otherwise the lastnames are the same
// return true if a belongs before b, false otherwise
// assuming an ascending ordering by firstName

if (a.firstName < b.firstName)
return true;
if (a.firstName > b.firstName)
return false;

// both lastname AND firstname match, try GPA
// return true if a belongs before b, false otherwise
// assuming a descending ordering by GPA

return a.GPA > b.GPA;
}

void sortStudents(Student array[], int numStudents)
{
sort(array, array + numStudents, doesABelongBeforeB);
}

Hopefully this will give you an idea of how to solve the problem.

Big-O requirements

Assume that the database holds N rows.
Assume there are C search criteria that were provided to identify matching rows.
Assume that the search criterion for a particular field identifies M matching items.
Assume that your query results in R matching rows of data.
Assume that the results are to be sorted using S sorting criteria.

Here are the time complexity requirements for the search method:

1. To determine all rows that meet a single search criterion (e.g., find all last
names between Nachenberg and Smallberg), search must run in average case
O(M log N) time.
34
2. To determine which rows meet all criteria and should be returned to the user
(e.g., last name is between Nachenberg and Smallberg, AND GPA is between
3.0 and 4.0), search must run in average case O(CM log N) time. (Hint:
Theres a hash-based version of the set class called unordered_set.)
3. To order your R matching rows based on the S sort criteria, search must run in
average case O(SR log R) time.
Our Testing Framework

We have built a simple data-driven test framework that you can use to test your project 4
implementation and also test how our solution works. If you simply use our provided
main.cpp and test.h files, youll have a nice system for testing your Database code.

So how does our test framework work?

Well, when you use our main.cpp and test.h files and compile them with your MultiMap
and Database classes, your compiled program will act as a test-bed. You can run your
compiled program from the command line with various test scripts, like this:



This will cause our test-bed to follow the test instructions in the test-script.dat file (which
well describe in a second), and use these instructions to test your Database class.

Alternatively, you can run your compiled program from within Visual Studio or Xcode,
and it will start by asking you for the name of the script file you want to use.

So what does a test script look like? Well its basically a bunch of commands, with one
command per line. Our test system loads up the script and basically executes it from top
to bottom, one command at a time. It prints out all of the results to cout as it goes.

Here are the commands you may use:
35

The file Command and the url Command

These commands may be used to import a schema and one or more rows of data from a
data file or from a website via a URL. The command syntax is (case sensitive):

file:c:\proj4-14\data-file.txt

or

file:/Users/fred/cs32p4/mydata

or

url:http://cs.ucla.edu/classes/winter14/cs32/Projects/4/Data/census.csv

Note that there are no superfluous spaces allowed between the command (e.g. file or
url), the colon (:) or the argument (the filename or URL).

This command will cause the specified file or web page to be loaded from the
disk/internet into your database by calling your Databases loadFromFile() or
loadFromURL() method. Such a file or web page must contain the schema first, then one
or more rows of data. See the loadFromFile() and loadFromURL() sections of this
document for more details on the appropriate format for this imported data.

The schema Command

This command may be used to initialize the current database and set its schema. The
syntax is (case sensitive):

schema:field1[*],field2[*],,fieldN[*]

Each field name may optionally have a * immediately following it, indicating that this
field is an indexed field. Fields that lack a * immediately after their name are non-
indexed fields. (The [*] above indicates an optional * value.) Heres an example:

schema:lastName*,firstName*,age*,occupation*,numKids,SSN

This designates that your schema has 6 items named lastName, firstName, age,
occupation, numKids and SSN (social security number). The lastName, firstName, age,
and occupation fields have been designated as indexed (this would be passed into your
Databases specifySchema() method).

36
Notice that no extraneous whitespace is allowed between fieldnames, asterisks, or the
colon.

The add Command

The add command may be used to add a new row to a database whose schema has
previously been set (e.g., either by loading the schema from a URL or file, or by
specifying it with the schema command). The syntax is:

add:field1Value,field2Value,,fieldNValue

Notice that you must not have extraneous spaces before or after the colon or separating
commas, although you may have spaces in your field values (e.g., software engineer) if
you like.

Heres how it might be used to add a new row of data consistent with the schema shown
in the section above:

add:Nachenberg,Carey,0042,software engineer,0,765-33-2242

This command will call your Databases addRow() method with the specified
parameters.
Issuing a Query: the qparam, sparam and execute commands

If youd like to issue a query to your database, you can do it by specifying one or more
sets of query parameters using the qparam command, then specifying zero or more sets
of sorting parameters using the sparam command, and finally using the execute
command.

Heres the syntax of the qparam command:

qparam:fieldName,minVal,maxVal
qparam:fieldname,,maxVal
qparam:fieldName,minVal,

You may leave out either minVal or maxVal, but not both. You must always have two
commas, even if you leave out a minimum or maximum value! Make sure not to have
any extraneous whitespace.

Here are some examples:

qparam:lastName,Aaronson,Albertson
This command would find all people with a last name of [Aaronson,Albertson]
qparam:lastName,Aaronson,Aaronson
37
This command would find all people with a last name of Aaronson
qparam:age,,30
This command would find all people whose age is less than or equal to 30, e.g.,
[00,30]
qparam:firstName,Gennady,
This command would find all people whose first name is greater than or equal to
Gennady

Note that you may have multiple qparam commands, one after the other, to specify
multiple search criteria for a query (see example below).

Heres the syntax of the sparam command:

sparam:fieldName,{ascending,descending}

Make sure not to have any extraneous whitespace. The { } means that you must pick
either ascending or descending, but not both.

Here are some examples:

sparam:lastName,ascending
This command specifies that data should be sorted by the values of the lastName
fields in ascending order.
sparam:age,descending
This command specifies that data should be sorted by the values of the age fields
in descending order.

Note that you may have multiple sparam commands, one after the other, to specify
multiple ordering criteria for a query (see example below). The first sparam command to
appear will be the primary ordering method, the second ordering command will then
specify the secondary ordering method, and so on.

Heres the syntax of the execute command:

execute

This command has no parameters, colons, etc. It must be placed after any desired qparam
or sparam commands. It will take all of the earlier query parameters and sorting
parameters found in the test script (since the last execute command) and pass them into
your Databases search() method. It will then get the results from the search() method
and print out all field values from each matching row to cout. Once it is done, all former
qparam and sparam commands will be discarded, and your next query must specify new
query and sorting parameters again.

Heres a complete example of a query:

38
qparam:lastName,Wang,Zeng
qparam:age,015,025
qparam:occupation,student,student
sparam:lastName,ascending
sparam:firstName,ascending
sparam:age,descending
execute

And heres what a result might look like:

Wang,Scott,019,student,0,545-28-2161
Wang,Taylor,017,student,0,938-11-9273
Wang,Taylor,016,student,0,735-30-2341
Yen,Kylie,020,student,0,478-82-9702
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

Notice that all returned records met the requirements of the query:

1. The names were between Wang and Zeng, inclusive
2. The ages were between 15 and 25 years old
3. The occupation was student

The results were then ordered first by their lastName in ascending order, then by their
firstName in ascending order, and finally by their age in descending order.

Requirements and Other Thoughts

Make sure to read this entire section before beginning your project!

1. In Visual C++, make sure to change your project from UNICODE to Multi Byte
Character set, by going to Project " Properties " Configuration Properties "
General " Character Set
2. In Visual C++, make sure to add wininet.lib to the set of input libraries, by going
to Project " Properties " Linker " Input " Additional Dependencies ;
otherwise, youll get a linker error!
3. The entire project can be completed in under 500 lines of C++ code beyond what
we've already written for you, so if your program is getting much larger than this,
talk to a TA youre probably doing something wrong.
4. Before you write a line of code for a class, think through what data structures and
algorithms youll need to solve the problem. How will you use these data
structures? Plan before you program!
5. Dont make your program overly complex use the simplest data structures
possible that meet the requirements.
6. You must not modify any of the code in the files we provide you that you will not
turn in; since you're not turning them in, we will not see those changes. We will
39
incorporate the required files that you turn in into a project with special test
versions of the other files.
7. Make sure to implement and test each class independently of the others that
depend on it. Once you get the simplest class coded, get it to compile and test it
with a number of different unit tests. Only once you have your first class working
should you advance to the next class.
8. Were providing you with working versions of the MultiMap and
MultiMap::Iterator classes that use the C++ STL libraries. You can use these
classes to build and test your Database class even if you cant figure out how to
implement your MultiMap or MultiMap::Iterator classes!
9. You may use only those STL containers (e.g., vector, list) that are not forbidden
by this spec. For MultiMap, this means you must use none at all. For Database,
this means you must not use map, multimap, unordered_map, or the nonstandard
hash_map; use your MultiMap class if you need a map, for example.
10. Try your best to meet our big-O requirements for each method in this spec. If you
cant figure out how, then solve the problem in a simpler, less efficient way, and
move on. Then come back and improve the efficiency of your implementation
later if you have time.

If you dont think youll be able to finish this project, then take some shortcuts. For
example, use the substitute MultiMap class we provide instead of creating your own
MultiMap class if necessary to save time.

You can still get a good amount of partial credit if you implement most of the project.
Why? Because if you fail to complete a class (e.g., MultiMap), we will provide a correct
version of that class and test it with the rest of your program. If you implemented the rest
of the program properly, it should work perfectly with our version of the MultiMap class
and we can give you credit for those parts of the project you completed.

But whatever you do, make sure that ALL CODE THAT YOU TURN IN BUILDS
without errors with both Visual Studio and either clang++ or g++!

What to Turn In

You should turn in five files:

MultiMap.h Contains your MultiMap and MultiMap::Iterator declarations
MultiMap.cpp Contains your MultiMap and MultiMap::Iterator implementations
Database.h Contains your Database class declaration
Database.cpp Contains your Database class implementation
report.doc, report.docx, or report.txt Contains your report

You are to define your classes' declarations and all member function implementations
directly within the specified .h and .cpp files. You may add any #includes or constants
40
you like to these files. You may also add support functions for these classes if you like
(e.g., operator<, loadItem). Make sure to properly comment your code.

You must submit a brief (You're welcome!) report that describes, in high-level language,
how the following various methods work and whether or not they satisfy our big-O
requirements or have any other flaws/bugs that you know of (or are not yet complete).
Be sure to make clear the meaning of the variables in your big-O expressions, e.g.,
"Assuming the MultiMap holds M items, then next() is O(log

M)."

Heres the list youre responsible for documenting:

MultiMap::findEqual()
MultiMap::Iterator::next()
Database::search()
Grading

95% of your grade will be assigned based on the correctness of your solution
5% of your grade will be based on your report

Good luck!

Potrebbero piacerti anche