Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
modeling
2
Mo2va2ng ques2ons
Why should we store our data in a rela2onal database?
3
Single table example
Consider the spreadsheet, Departments.xlsx:
DeptNbr DeptName DeptType DeptStatus
930 Receiving Mfg Ac2ve
378 Assembly Mfg Ac2ve
372 Finance Adm Ac2ve
923 Planning Adm Ac2ve
483 Construc2on Plant Inac2ve
5
Rela2onal model deni2ons
En2ty: object, concept or event
8
En2ty types and en2ty occurrences
En2ty type En2ty occurrence
Departments Departments
DeptNbr DeptNbr DeptName DeptType DeptStatus
930 Receiving Mfg Ac2ve
DeptName 378 Assembly Mfg Ac2ve
DeptType 372 Finance Adm Ac2ve
DeptStatus 923 Planning Adm Ac2ve
483 Construc2on Plant Inac2ve
9
Example en2ty type descrip2ons
Poor descrip2on (seen lots of these)
n Vendors: Someone we buy products from.
Exemplary descrip2on (never seen one like this in real life)
n Vendors: US corpora2ons we have reviewed with respect to their
qualica2ons for providing products to our company. Vendors are
rated based on price, quality, delivery performance and nancial
stability. Each vendor is classied by one vendor status: approval
pending, approved, rejected or inac2ve. This approval decision is
made in a weekly mee2ng among purchasing, manufacturing and
nance. Purchasing requests that rejected vendors be kept in the
database for future reference. Purchasing expects 400 vendors will be
maintained at any one 2me. Of these, 200 will be ac2ve, 25 pending,
75 inac2ve and 100 rejected. Contact Joan Smith in Purchasing for
more informa2on.
10
Data models
When designing a database to store and analyze data, you
rst need to develop a data model
11
Key points from lesson
Data in rela2onal databases are organized into tables, which
represent en22es
12
Data modeling
13
Solu2ons: En2ty and aWribute
Iden2fy which are en22es and which are aWributes:
Instructor (E) Instructor oce hours (A)
Teaching assistant (TA) (E) Textbook 2tle (A)
Course sec2on number (A) Classroom number (A)
Building name (A) TA student ID (A)
Course number (A) Instructor name (A)
Textbook price (A) Textbook publisher (A)
Teaching asst (TA) name (A) Sec2on capacity (A)
Instructor ID (A) Course objec2ve (A)
Textbook author (A) Copyright date (A)
Course 2tle (A) Building number (A)
Textbook (E) Course sec2on (E)
Classroom (E) Course (E)
Textbook ISBN (A) Building (E)
Sec2on days (A) Sec2on 2me (A)
Classroom capacity (A)
14
Designing a data model
Data models help specify each en2ty in a table in a
standardized way
15
Rules of the rela2onal data model
Each aWribute (column) has
a unique name within a
table
Departments
All entries or values in the DeptNbr DeptName DeptType DeptStatus
aWribute are examples of 930 Receiving Mfg Ac2ve
that aWribute 378 Assembly Mfg Ac2ve
372 Finance Adm Ac2ve
923 Planning Adm Ac2ve
Each record (row) is unique 483 Construc2on Plant Inac2ve
in a good database
18
Rela2onships and cardinality
19
Solu2ons: En2ty type and aWribute
20
How to draw an en2ty-rela2onship
diagram (ERD)
ERD or en2ty-rela2onship diagram is a schema2c of the
database
23
Transporta2on broker example
On the next slide there is a small data model for a freight
shipping broker
24
Transporta2on broker data model
25
Transporta2on broker data model
26
Transporta2on broker data model
27
Transporta2on broker data model
28
Transporta2on broker data model
29
Transporta2on broker data model
30
Transporta2on broker data model
31
Is there always only one solu2on for a
data model?
Several solu2ons may exist
32
Domain valida2on en22es
Also called pick lists or valida2on lists
Used to standardize data in a database
Department Domain valida6on en6ty
DeptNbr DeptName DeptType DeptStatus ValidDeptTypes
930 Receiving Mfg Ac2ve Mfg
378 Assembly Mfg Ac2ve Adm
372 Finance Adm Ac2ve Plant
923 Planning Adm Ac2ve Sales
483 Construc2on Plant Inac2ve Opera2ons
34
Keys
35
Solu2ons: Rela2onships and cardinality
36
Solu2ons: Rela2onships and cardinality
37
Solu2ons: Rela2onships and cardinality
38
Solu2ons: Rela2onships and cardinality
39
Solu2ons: Rela2onships and cardinality
40
Solu2ons: Rela2onships and cardinality
41
Primary and foreign keys
Primary key: one or more aWributes that uniquely iden2fy a
record
What would you use in a customer database of 100,000
people and no unique customer id?
Name not unique
Can use social security number, but not everyone has one
Privacy is an issue
42
Primary and foreign keys
Primary key of the independent or parent en2ty type is
maintained as a non-key aWribute in the related, dependent
or child en2ty type, this is known as the foreign key
43
Foreign keys
Employee
EmpID DeptID EmpLastName EmpFirstName Department
4436 483 Brown John DeptID DeptName
4574 483 Jones Helen 930 Receiving
5678 372 Smith Jane 378 Assembly
5674 372 Crane Sally 372 Finance
9987 923 Black Joe 923 Planning
5123 923 Green Bill 483 Construc2on
5325 483 Clinton Bob
46
Many to many rela2onships
Vehicle Driver
VehicleI VehicleMak VehicleMod DriverI DriverNam DriverLicenseN
D e el D e br
35 Volvo Wagon 253 Ken A23423
33 Ford Sedan 900 Jen B89987
89 GMC Truck VehicleDriver
VehicleI DriverI
D D
35 900
35 253
89 900
Never create an entity with vehicle1, vehicle2, etc. as attributes!
47
Referen2al integrity
Referen2al integrity maintains the validity of foreign keys
when the primary key in the parent table changes
n Every foreign key either matches a primary key (or is null)
n For example: cannot add an employee to an invalid
department
48
Key points from lesson
Primary keys are aWributes used to uniquely iden2fy a record
50
Solu2ons: Primary and foreign keys
Were getting there with this ERD: weve defined entities, attributes,
relationships and keys
51
Introduc2on to edX data modeling
exercise
We have all of the user data from the rst year of running edX
Flat le: a table with all aWributes and records from which we
will design the database using a rela2onal data model
52
Exercise mo2va2on and approach
Why is it useful to do this exercise?
n Scenario: Customer or client hands you data for analysis in
one or more giant Excel sheet
n Must understand business rules underpinning the dataset
53
edX Dataset File (from website)
54
Data dic2onary
Column Name Descrip2on
course_id three-part iden2er for a course
Course_Short_Title Short 2tle for the course
Course_Long_Title Long 2tle for the course
userid_DI Individual user ID
registered Whether the user is registered (1/0)
viewed Whether the user has viewed the contents (1/0)
explored Whether the user has explored the course (1/0)
cer2ed Whether the user is cer2ed (1/0)
Country Users country of origin
LoE_DI Users level of educa2on
YoB Users year of birth
Age Users age
gender Users gender
grade Users grade in the course
nevents Number of events the user has done on the site
ndays_act Number of ac2ons taken by the user
nplay_video Number of video plays done by the user
nchapters Number of chapters read by the user
nforum_posts Number of forum posts made by the user
roles Any roles the user has
incomplete_ag Whether the user has an incomplete for the course
55
Most obvious en22es and keys
En2ty: Users
n Primary key: userid_DI
En2ty: Courses
n Primary key: course_id
56
Primary key for courses
course_id: MITx/6.00x/2013_Spring
58
Key points from lesson
Selec2on of en22es and associated aWributes from a at le is
not always obvious
59