Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Big Data
Spring
2017
Databases and Relational Databases
Juliana Freire
Computer Science & Engineering
Visualization and Data Analysis (ViDA)
Center for Data Science (CDS)
Center for Urban Science and Progress (CUSP)
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Today
• Why study databases?
• Introduction to Relational Databases
• Representing structured data with the Relational Model
• Accessing and querying data using SQL
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
1
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
2
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
3
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
DB
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
4
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
5
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
6
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
What’s in a DBMS?
Data
Query
Transactions and
model
language
crash recovery
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
7
2/12/17
8
2/12/17
Keys
ER diagram
Cardinality
date constraints
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
ER diagram
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
9
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
ER to Relational
Patient(patientId:int, patientName:str, age: int)
Takes(patientId:int,prescId:int,prescDate:date)
Prescription(prescId:int, presName:str)
date
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
10
2/12/17
age >= 18
Patient-
and age <= 45
Patient-id Patient-
name age
192-83-7465 Johnson
23
019-28-3746 Smith
78
192-83-7465 Johnson
23
019-28-3746 Smith
78
11
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
• Design Goals:
• Avoid redundant data
• Ensure that relationships among attributes are represented
• Ensure constraints are properly modeled: updates check for violation of
database integrity constraints
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
12
2/12/17
Query Languages
• Query languages: Allow manipulation and retrieval of data
from a database
• Queries are posed wrt data model
• Operations over objects defined in data model
• Relational model supports simple, powerful QLs:
• Strong formal foundation based on logic
• Allows for optimization
• Query Languages != programming languages
• QLs support easy, efficient access to large data sets
• QLs not expected to be “Turing complete”
• QLs not intended to be used for complex calculations
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Query Languages
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
13
2/12/17
Levels of Abstraction
• Many views, single conceptual (logical) schema and physical
schema
• Views describe how users see the data View 1 View 2
View 3
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
14
2/12/17
Data Independence
• Applications insulated from how data is structured and stored
• Logical data independence: Protection from changes in
logical structure of data
• Changes in the logical schema do not affect users as long as their views are
still available
• Physical data independence: Protection from changes in
physical structure of data
• Changes in the physical layout of the data or in the indexes used do not
affect the logical relations
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
15
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
16
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
• List all accounts with balance = 350
• Requires all tuples to be examined
How many tuples need to be accessed?
• An index on balance makes
it efficient to find tuples with a
specific balance
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
17
2/12/17
Index Selection
• Selecting the best indexes for a database is a hard problem
• Tradeoffs
• Index on an attribute speeds up queries that mention that attribute (including
joins)
• Indexes make insertions, deletions and updates more complex and time
consuming
• Indexes use up space – an extra table
• Rule of thumb
• If R is queried more often than updated, create indexes on attributes most
frequently specified in queries
• If R is updated often, be careful!
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
18
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
19
2/12/17
select
σbalance=350
use index(balance)
scan accounts
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Concurrency Control
• Concurrent execution of user programs is essential for good
DBMS performance
• But interleaving actions of different user programs can lead to
inconsistency
• e.g., Nurse and doctor can simultaneously edit a patient record
• DBMS ensures such problems don’t arise: users can pretend
they are using a single-user system
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
20
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Ensuring Atomicity
• DBMS ensures atomicity (all-or-nothing property) even if
system crashes in the middle of a transaction
• If there is power outage, will the patient database become inconsistent?
• Idea: Keep a log (history) of all actions carried out by the
DBMS while executing a set of transactions
• Before a change is made to the database, the corresponding log entry is
forced to a safe location
• After a crash, the effects of partially executed transactions are undone using
the log; and if log entry wasn’t saved before the crash, corresponding
change was not applied to database!
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
21
2/12/17
Summary
• DBMS used to maintain and query (large) structured datasets
• Benefits include recovery from system crashes, concurrent
access, quick application development, data integrity and
security
• Levels of abstraction give data independence
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
22
2/12/17
Database Model
• Provides the means for
• specifying particular data structures
• constraining the data sets associated with these structures, and
• manipulating the data
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
23
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Example: A Relation
Relation
name
Account
Number
Owner
Balance
Type
101
J. Smith
1000.00
checking
102
W. Wei
2000.00
checking
103
J. Smith
5000.00
savings
104
M. Jones
1000.00
checking
105
H. Martin
10,000.00
checking
24
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Attribute domains:
Number must be a 3-digit number
Owner must be a 30-character string
Type must be checking or savings
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
25
2/12/17
Basic Structure
• Given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x …
x Dn"
Thus a relation is a set of n-tuples (a1, a2, …, an) where each ai
∈ Di
• Each attribute of a relation has a name
• Set of allowed values for attribute is called the domain of the
attribute
• Attribute values are required to be atomic, that is, indivisible
• multi-valued attribute values are not atomic
• composite attribute values are not atomic
• The special value null is a member of every domain
• E.g., phone number = null
• Semantics: number is unknown; number does not exist
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Account
Number
Owner
Balance
Type
101
J. Smith
1000.00
checking
102
W. Wei
2000.00
checking
103
J. Smith
5000.00
savings
104
M. Jones
1000.00
checking
105
H. Martin
10,000.00
checking
26
2/12/17
Relation Schema
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
E.g. Account-schema ="
(number, owner, balance, type )
• r(R) is a relation on the relation schema R
E.g.,
account (Account-schema)
• Says “account is a relation conforming to Account-schema”
• Attributes of a relation form a set
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Relation Instance
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
27
2/12/17
Account
Number
Owner
Balance
Type
101
J. Smith
1,000.00
checking
102
W. Wei
2,000.00
checking
103
104
M. Jones
1,000.00
checking
deleted
105
H. Martin
10,000.00
checking
new
107
W. Yu
7,500.00
savings
109
R. Jones
432.55
checking
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Rows/Tuples
Account
Number
Owner
Balance
Type
101
J. Smith
1000.00
checking
102
W. Wei
2000.00
checking
103
J. Smith
5000.00
savings
...
28
2/12/17
Rows/Tuples
Account
Number
Owner
Balance
Type
101
J. Smith
1000.00
checking
102
W. Wei
2000.00
checking
103
J. Smith
5000.00
savings
...
Relational Database
• A database consists of multiple relations
• Information is broken up---each relation store one part of the
information"
E.g.: account : information about accounts"
deposit: information about deposits into accounts
check : information about checks
• What would happen if we stored all information in a single
table?
• E.g., bank(account-number, balance, customer-name, deposit-date, deposit-
amount…)
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
29
2/12/17
Relational Database
• A database consists of multiple relations
• Information is broken up---each relation store one part of the
information"
E.g.: account : information about accounts"
deposit: information about deposits into accounts
check : information about checks
• What would happen if we stored all information in a single
table?
• E.g., bank(account-number, balance, customer-name, deposit-date, deposit-
amount…)
• Repetition of information (e.g., two customers own an account, two deposits
on the same account)
• The need for null values (e.g., represent a customer without an account)
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
30
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Keys
• Let K ⊆ R
• K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
• Example: {account-number, account-owner} and"
{account-number} "
are both superkeys of Account, if no two accounts can possibly have the
same number.
• K is a candidate key if K is minimal"
Example: {account-number} is a candidate key for Account –
it is a superkey and no subset of it is a superkey
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
31
2/12/17
Is this legal?
If not, how do we prevent it
from happening?
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
32
2/12/17
• Select attributes for each relation and give the domain for each
attribute
VISUALIZATION
Skip to XML
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
33
2/12/17
34
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
35
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
36
2/12/17
SQL Values
• Integers and reals are represented as you would expect
• Strings are too, except they require single quotes
• Two single quotes = real quote, e.g., ’Joe’’s Bookstore’.
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
37
2/12/17
Times as Values
• The form of a time value is:
TIME ’hh:mm:ss’
with an optional decimal point and fractions of a second
following.
• Example: TIME ’17:00:02.5’ = two and a half seconds after 5:00PM.
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Declaring Keys
• An attribute or list of attributes may be declared PRIMARY
KEY or UNIQUE
• Either says that no two tuples of the relation may agree in all
the attribute(s) on the list
• There are a few distinctions to be mentioned later
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
38
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
39
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
40
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
41
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
• Data in Databases
• Structure: relational schemas
• Sharing data à query languages, mechanisms for concurrency control and
recovery. Need to preserve data integrity!
• Physical independence: logical view to query the data; physical for efficiency
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
42
2/12/17
Information in HTML
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
43
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
XML In Action
SwissProt data
XML encoding of SwissProt
ID GRAA_HUMAN STANDARD; PRT; 262 AA.
<?xml version ="1.0"?>
AC P12544;
<cml title="SwissProtTree">
DT 01-OCT-1989 (REL. 12, CREATED)
<molecule title="SwissProt file" scheme="SWISSPROT">
DT 01-OCT-1989 (REL. 12, LAST SEQUENCE UPDATE)
<string title="ID" scheme="SWISSPROT">GRAA_HUMAN</string>
DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE)
<integer title="Length">262</integer>
RA GERSHENFELD H.K., HERSHBERGER R.J., SHOWS T.B., WEISSMAN I.L.;
DE GRANZYME A PRECURSOR (EC 3.4.21.78) (CYTOTOXIC T-LYMPHOCYTE PROTEINASE
<string dictname="AC" title="Accession Number(s)" scheme="SWISSPROT">
DE 1) (HANUKKAH FACTOR) (H FACTOR) (HF) (GRANZYME 1) (CTL TRYPTASE)
<list title="History">
DE (FRAGMENTIN 1).
<list dictname="DT" title="Revision" scheme="SWISSPROT">
GN GZMA OR CTLA3 OR HFSP.
<string title="Comments">REL. 12, CREATED</string>
OS HOMO SAPIENS (HUMAN).
<date>1989-10-01</date>
OC Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria;
</list>
OC Primates; Catarrhini; Hominidae; Homo.
<list dictname="DT" title="Revision" scheme="SWISSPROT">
RN [1]
<string title="Comments">REL. 12, LAST SEQUENCE UPDATE</string>
RP SEQUENCE FROM N.A.
<date>1989-10-01</date>
RC TISSUE=T-CELL;
</list>
RX MEDLINE; 88125000.
RA GERSHENFELD H.K., HERSHBERGER R.J., SHOWS T.B., WEISSMAN I.L.;
<list title="AUTHORS">
<list dictname="DT" title="Revision" scheme="SWISSPROT">
<string title="Comments">REL. 37, LAST ANNOTATION UPDATE</string>
RT "Cloning and chromosomal assignment of a human cDNA encoding a T
<person>
<date>1998-12-15</date>
RT cell- and natural killer cell-specific trypsin-like serine
</list>
RT protease.";
<initials>H.K.</initials>
</list>
RL PROC. NATL. ACAD. SCI. U.S.A. 85:1184-1188(1988).
<string dictname="DE" title="Description" scheme="SWISSPROT"> GRANZYME
RN [2]
<surname>GERSHENFELD</surname>
<string dictname="GN" title="Gene Name(s)" scheme="SWISSPROT">GZMA OR
RP SEQUENCE OF 29-53.
RX MEDLINE; 88330824.
</person>
<string dictname="OS" title="Organism Species" scheme="SWISSPROT"> HOMO
<string dictname="OC" title="Organism Classification" scheme="SWISSPROT">
RA POE M., BENNETT C.D., BIDDISON W.E., BLAKE J.T., NORTON G.P.,
RA RODKEY J.A., SIGAL N.H., TURNER R.V., WU J.K., ZWEERINK H.J.;
<person>
<citation number="1" title="SEQUENCE FROM N.A.">
<list title="AUTHORS">
RT "Human cytotoxic lymphocyte tryptase. Its purification from granules
<initials>R.J.</initials>
<person>
RT and the characterization of inhibitor and substrate specificity.";
<initials>H.K.</initials>
RL J. BIOL. CHEM. 263:13215-13222(1988).
<surname>HERSHBERGER</surname>
<surname>GERSHENFELD</surname>
RN [3]
</person>
RP SEQUENCE OF 29-40, AND CHARACTERIZATION.
</person>
<person>
RX MEDLINE; 89009866.
RA HAMEED A., LOWREY D.M., LICHTENHELD M., PODACK E.R.;
<person>
<initials>R.J.</initials>
<surname>HERSHBERGER</surname>
RT "Characterization of three serine esterases isolated from human IL-2
<initials>T.B.</initials>
</person>
RT activated killer cells.";
<person>
RL J. IMMUNOL. 141:3142-3147(1988).
<surname>SHOWS</surname>
<initials>T.B.</initials>
RN [4]
<surname>SHOWS</surname>
RP SEQUENCE OF 29-39, AND CHARACTERIZATION.
</person>
</person>
RX MEDLINE; 89035468.
RA KRAEHENBUHL O., REY C., JENNE D.E., LANZAVECCHIA A., GROSCURTH P.,
…
<person>
<initials>I.L.</initials>
RA CARREL S., TSCHOPP J.;
[…]
</list>
<surname>WEISSMAN</surname>
</person>
</list>
VISUALIZATION
IMAGING AND
[…]
DATA ANALYSIS
CENTER
44
2/12/17
XML In Action
XML
<Book>
<Title>The Advanced Html Companion</Title>
<Author> Keith Schengili-Roberts </Author>
<Author> Kim Silk-Copeland</Author>
<Price> 35.96</Price>
…
</Book>
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
• Tree structure
• Nodes with parent/child relationships
• Easier to understand
• Easier to enforce
<list title="AUTHORS">
<person>
• Easier to navigate
<initials>H.K.</initials>
<surname>GERSHENFELD</surname>
</person>
<person>
<initials>R.J.</initials>
RA GERSHENFELD H.K., HERSHBERGER
R.J., SHOWS T.B., WEISSMAN I.L.;
<surname>HERSHBERGER</surname>
</person>
<person>
<initials>T.B.</initials>
<surname>SHOWS</surname>
</person>
…
</list>
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
45
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
46
2/12/17
FpML (finance)
• Complex nesting
• Transaction
• Example queries
• Who are the parties involved in a given contract?
• When does the contract expire?
• What are the various components of a contract?
• What the is the total amount of a contract?
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
HL7 (healthcare)
• Stream data (when coming from medical devices)
• Legacy data
• Transaction
• Temporal queries
• Example queries
• Who is the patient?
• When did the measurement take place?
• For the duration of the measurement, what were the max and min value for
the following vital signals (…)?
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
47
2/12/17
• Instance: actual data conforming to a schema
• Relational: Set of tables (instances of relations)
• XML: Ordered tree
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
48
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
49
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Relational Algebra
• We can describe tables in a relational database as sets of
tuples
• We can describe query operators using set theory
• The query language is called relational algebra
• Normally, not used directly -- foundation for SQL and query
processing
• SQL adds syntactic sugar
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
50
2/12/17
What is an “Algebra”
• Mathematical system consisting of:
• Operands --- variables or values from which new values can be constructed
• Operators --- symbols denoting procedures that construct new values from
given values
• Expressions can be constructed by applying operators to atomic
operands and/or other expressions
• Operations can be composed -- algebra is closed
• Parentheses are needed to group operators
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
51
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
52
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
53
2/12/17
Projection: Example
π (pi)
π AttributeList R = project -- deletes attributes that are not in
projection list.
Sample query: πNumber, Owner, Type Account
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
54
2/12/17
Projection: Example
π
= project
Sample query: πNumber, Owner, Type Account
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Projection: Example
π
= project
Sample query: πNumber, Owner, Type Account
input relation
DATA ANALYSIS
CENTER
55
2/12/17
Owner
Note: Projection operator eliminates duplicates,
J. Smith
Why???
W. Wei
M. Jones
In a DBMS products, do you think
H. Martin
duplicates should be eliminated
for every query? Are they?
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
56
2/12/17
R= (A
B )
1
2
3
4
πA+B->C,A,A (R) =
C
A1
A2
3
1
1
7
3
3
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Set Operations
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
57
2/12/17
Union: Example
∪ = union
Checking-account ∪ Savings-account
Union Compatible
• Two relations are union-compatible if they have the same degree
(i.e., the same number of attributes) and the corresponding
attributes are defined on the same domains.
• Suppose we have these tables:"
"
" Checking-Account
(c-num:str, c-owner:str, c-balance:real)
"
Savings-Account
(s-num:str, s-owner:str, s-balance:real)
These are union-compatible tables.
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
58
2/12/17
Intersection
Checking-account
∩ = intersection
c-num
101
c-owner
J. Smith
c-balance
1000.00
102
W. Wei
2000.00
Checking-account ∩ Savings-account
104
M. Jones
1000.00
105
H. Martin
10,000.00
Savings-account
What’s the answer to this query?
s-num
s-owner
s-balance
103
J. Smith
5000.00
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Intersection (cont.)
Checking-account
Checking-account ∩ Savings-account
c-num
101
c-owner
J. Smith
c-balance
1000.00
102
104
W. Wei
M. Jones
2000.00
1000.00
105
H. Martin
10,000.00
What’s the answer to this query?
Savings-account
s-num
s-owner
s-balance
103
J. Smith
5000.00
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
59
2/12/17
Intersection (cont.)
Checking-account
Checking-account ∩ Savings-account
c-num
101
c-owner
J. Smith
c-balance
1000.00
102
104
W. Wei
M. Jones
2000.00
1000.00
105
H. Martin
10,000.00
What’s the answer to this query?
Savings-account
s-num
s-owner
s-balance
It’s empty. There are no tuples that are in both tables.
103
J. Smith
5000.00
(π c-ownerChecking-account) ∩ (π s-ownerSavings-account)
What’s the answer to this new query?
c-owner
J. Smith
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Difference
Checking-account
⎯ = difference
c-num
101
c-owner
J. Smith
c-balance
1000.00
102
W. Wei
2000.00
104
M. Jones
1000.00
105
H. Martin
10,000.00
Savings-account
s-num
s-owner
s-balance
103
J. Smith
5000.00
Find all the customers that own a Checking-account and do not
own a Savings-account.
(π c-ownerChecking-account) ⎯ (π s-ownerSavings-account)
• What is the schema of result?
c-owner
W. Wei
M. Jones
H. Martin
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
60
2/12/17
Challenge Question
• How could you express the intersection operation if you didn’t
have an Intersection operator in relational algebra? [Hint: Can
you express Intersection using only the Difference operator?]
A ∩ B = ???
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
61
2/12/17
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Cross Product
• R1 X R2
• Each row of R1 is paired with each row of R2.
• Result schema has one field per field of R1 and R2, with field
names `inherited’ if possible.
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
62
2/12/17
Join: Example
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
Join: Example
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
63
2/12/17
Join: Example
⋈ join Account ⋈ (Number=Account and Amount>700) Deposit
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
64
2/12/17
Joins
• Condition Join: R ⋈c S = σ c (R X S)
• Sometimes called a theta-join
• Result schema same as that of cross-product
• Fewer tuples than cross-product, might be able to compute
more efficiently
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
65
2/12/17
"
Merci"
Thank you"
Obrigada"
благодаря"
Kiitos"
धन्यवाद"
Tack"
Danke"
Ευχαριστω+
Bedankt
VISUALIZATION
IMAGING AND
DATA ANALYSIS
CENTER
66