Sei sulla pagina 1di 59

Data

modeling

MIT Center for


Transportation & Logistics ctl.mit.edu
Introduc2on

2
Mo2va2ng ques2ons
Why should we store our data in a rela2onal database?

How should we organize our data?

Why do we need data models to design a database?

What makes a good data model?

3
Single table example
Consider the spreadsheet, Departments.xlsx:
DeptNbr DeptName DeptType DeptStatus
930 Receiving Mfg Ac2ve
378 Assembly Mfg Ac2ve
372 Finance Adm Ac2ve
923 Planning Adm Ac2ve
483 Construc2on Plant Inac2ve

This sheet stores informa2on about the concept of a


department within a company

This would be a table in a rela2onal database


4
Rela2onal model deni2ons

5
Rela2onal model deni2ons
En2ty: object, concept or event

AWribute (column): a characteris2c of an en2ty

Record or tuple (row): the specic characteris2cs or aWribute


values for one example of an en2ty

Entry: the value of an aWribute for a specic record

Table: a collec2on of records

Database: a collec2on of tables


6
Single table example
En2ty: Departments
Table: A collec2on of records
about the en2ty Departments
DeptNbr DeptName DeptType DeptStatus
(departments)
930 Receiving Mfg Ac2ve

Record: Informa2on 378 Assembly Mfg Ac2ve

about department 372 372 Finance Adm Ac2ve


923 Planning Adm Ac2ve
Entry: Value of DeptNbr 483 Construc2on Plant Inac2ve
for the construc2on
department AWribute: DeptName the
names of the departments
Database: CompanyDatabase, includes tables such as:
Departments, Employees, Sales
7
Deeper dive on tables and aWributes
Tables
n Tables represent en22es, which are usually plural nouns
n Tables are o^en named as exactly what they represent
(typically plural nouns, without spaces):
w e.g. Companies, Customers, Vehicles, Orders, etc.
AWributes
n Characteris2cs of an en2ty (table), typically nouns

n Examples in the form of: Table (AWr1, AWr2, ... AWrN)

w Vehicles (VIN, Color, Make, Model, Mileage)


w Drivers (SSN, Fname, Lname, Address)
w DriverLicenses (Type, Start_date, Expira2on_date)

8
En2ty types and en2ty occurrences
En2ty type En2ty occurrence
Departments Departments
DeptNbr DeptNbr DeptName DeptType DeptStatus
930 Receiving Mfg Ac2ve
DeptName 378 Assembly Mfg Ac2ve
DeptType 372 Finance Adm Ac2ve
DeptStatus 923 Planning Adm Ac2ve
483 Construc2on Plant Inac2ve

When developing a data model, en2ty type descrip2ons


should be as extensive as possible

9
Example en2ty type descrip2ons
Poor descrip2on (seen lots of these)
n Vendors: Someone we buy products from.
Exemplary descrip2on (never seen one like this in real life)
n Vendors: US corpora2ons we have reviewed with respect to their
qualica2ons for providing products to our company. Vendors are
rated based on price, quality, delivery performance and nancial
stability. Each vendor is classied by one vendor status: approval
pending, approved, rejected or inac2ve. This approval decision is
made in a weekly mee2ng among purchasing, manufacturing and
nance. Purchasing requests that rejected vendors be kept in the
database for future reference. Purchasing expects 400 vendors will be
maintained at any one 2me. Of these, 200 will be ac2ve, 25 pending,
75 inac2ve and 100 rejected. Contact Joan Smith in Purchasing for
more informa2on.

10
Data models
When designing a database to store and analyze data, you
rst need to develop a data model

The data model describes the data that is stored in the


database and how to access it

The data model denes the tables and aWributes in the


database
n Each important concept/noun in the data is dened as a

table in the database

11
Key points from lesson
Data in rela2onal databases are organized into tables, which
represent en22es

Single tables within a database are like spreadsheets, but we


use dierent vocabulary to talk about the rows and columns

En2ty types should be described as part of the data modeling


process, this will help with the documenta2on and
determina2on of business rules

12
Data modeling

13
Solu2ons: En2ty and aWribute
Iden2fy which are en22es and which are aWributes:
Instructor (E) Instructor oce hours (A)
Teaching assistant (TA) (E) Textbook 2tle (A)
Course sec2on number (A) Classroom number (A)
Building name (A) TA student ID (A)
Course number (A) Instructor name (A)
Textbook price (A) Textbook publisher (A)
Teaching asst (TA) name (A) Sec2on capacity (A)
Instructor ID (A) Course objec2ve (A)
Textbook author (A) Copyright date (A)
Course 2tle (A) Building number (A)
Textbook (E) Course sec2on (E)
Classroom (E) Course (E)
Textbook ISBN (A) Building (E)
Sec2on days (A) Sec2on 2me (A)
Classroom capacity (A)

14
Designing a data model
Data models help specify each en2ty in a table in a
standardized way

Data models allow administrator to impose rules, constraints,


and rela2onships on the data that are stored
n Enables users to understand business rules and eec2vely

process and analyze data

Acts as a schema2c for building the database

15
Rules of the rela2onal data model
Each aWribute (column) has
a unique name within a
table
Departments
All entries or values in the DeptNbr DeptName DeptType DeptStatus
aWribute are examples of 930 Receiving Mfg Ac2ve
that aWribute 378 Assembly Mfg Ac2ve
372 Finance Adm Ac2ve
923 Planning Adm Ac2ve
Each record (row) is unique 483 Construc2on Plant Inac2ve
in a good database

Ordering of records and


aWributes is unimportant 16
Characteris2cs of a good data model
Complete: Is all necessary data represented?

No redundancy: Is the same fact recorded more than once?

Enforcement of rules: How accurately does it enforce


business rules?

Reusability: Can the database be used for dierent


applica2ons (e.g. web applica2on, enterprise analy2cs, etc.?)

Flexibility: Can the model cope with possible changes to the


business rules or data requirements?
17
Key points from lesson
The data model describes the data that is stored in the
database and how to access it

Each record is unique in a good database

Data models enable users to understand business rules and


eec2vely process and analyze data

18
Rela2onships and cardinality

19
Solu2ons: En2ty type and aWribute

20
How to draw an en2ty-rela2onship
diagram (ERD)
ERD or en2ty-rela2onship diagram is a schema2c of the
database

En22es are drawn as boxes

Rela2onships between en22es are indicated by lines between


these en22es

Cardinality describes the expected number of related


occurrences between the two en22es in a rela2onship and is
shown using crow's foot nota2on
Rela2onships + cardinality = business rules
21
ERD for Instructors and CourseSec2ons
Rela2onship: There is a rela2onship
between Instructors and CourseSec2ons

Cardinality: Exactly one Cardinality: Zero or many

Business rules dened through rela2onships and cardinality:


n There is exactly one instructor for each course sec2on
n Each instructor may teach zero, one or many course
sec2ons (shortened to zero or many)
22
Cardinality crow's foot nota2on
General meanings:

Mandatory vs. op2onal:

23
Transporta2on broker example
On the next slide there is a small data model for a freight
shipping broker

Captures underlying rules or logic of broker's business

Provides informa2on about how the database should be


structured

24
Transporta2on broker data model

A carrier can be associated with many oces


An oce can be associated with many carriers

25
Transporta2on broker data model

A carrier can issue many contracts


A contract is issued by one carrier

26
Transporta2on broker data model

An oce can employ many agents


An agent is employed by one oce

27
Transporta2on broker data model

An agent can sell many contracts


A contract is serviced by only one agent

28
Transporta2on broker data model

A contract can serve to carry only one commodity type


A commodity type can be carried under many contracts

29
Transporta2on broker data model

A contract can be associated with many equipment types


An equipment type can be associated with many contracts

30
Transporta2on broker data model

A customer can be served by many contracts


A contract covers one customer

31
Is there always only one solu2on for a
data model?
Several solu2ons may exist

O^en, these will describe dierent underlying business


processes or rules

These o^en depend on the applica2on requirements or


business needs

32
Domain valida2on en22es
Also called pick lists or valida2on lists
Used to standardize data in a database
Department Domain valida6on en6ty
DeptNbr DeptName DeptType DeptStatus ValidDeptTypes
930 Receiving Mfg Ac2ve Mfg
378 Assembly Mfg Ac2ve Adm
372 Finance Adm Ac2ve Plant
923 Planning Adm Ac2ve Sales
483 Construc2on Plant Inac2ve Opera2ons

Domain valida2on en2ty: table with a single aWribute,


enforces values of aWribute in related table
Requires that any new department type must be on a list of
exis2ng department types in the table "ValidDeptTypes"
33
Key points from lesson
Business rules are imposed on the database through
rela2onships and cardinality

Business rules are also understood based on rela2onship and


cardinality

Domain valida2on en22es restrict entries to a set of specied


values

Data models may vary for a given dataset as business logic


evolves

34
Keys

35
Solu2ons: Rela2onships and cardinality

Course may be oered in many (0,1 or more) sec2ons


Course sec2on must be associated with a course

36
Solu2ons: Rela2onships and cardinality

Course sec2on may be taught by many (0,1 or more) TAs


TA may teach many (0, 1 or more) course sec2ons

37
Solu2ons: Rela2onships and cardinality

Course sec2on must be taught by 1 instructor (??)


Instructor may teach many sec2ons

38
Solu2ons: Rela2onships and cardinality

Course may use many textbooks (all sec2ons use same)


Textbook may be used in many courses

39
Solu2ons: Rela2onships and cardinality

Building may contain many rooms


A room is in only one building

40
Solu2ons: Rela2onships and cardinality

A course sec2on may use a room


A room may be used by many course sec2ons (not at same 2me)

41
Primary and foreign keys
Primary key: one or more aWributes that uniquely iden2fy a
record
What would you use in a customer database of 100,000
people and no unique customer id?
Name not unique

Add birthdate, but not guaranteed to be unique

Address can change

Can use social security number, but not everyone has one

Privacy is an issue

42
Primary and foreign keys
Primary key of the independent or parent en2ty type is
maintained as a non-key aWribute in the related, dependent
or child en2ty type, this is known as the foreign key

43
Foreign keys

Employee
EmpID DeptID EmpLastName EmpFirstName Department
4436 483 Brown John DeptID DeptName
4574 483 Jones Helen 930 Receiving
5678 372 Smith Jane 378 Assembly
5674 372 Crane Sally 372 Finance
9987 923 Black Joe 923 Planning
5123 923 Green Bill 483 Construc2on
5325 483 Clinton Bob

Database requires a valid department number (or null) when


employee is added
Employee ID is the unique identifier of employees; department
number is not needed as part of the employee primary key
44
Composite keys
A composite key is a primary key that consists of more than
one aWribute

Consider a charter airline: every ight has a dierent number


FlightSeat
Seat
FlightNb SeatNb SeatStatu Descriptio
r r s n
Flight
FlightNb FlightDat DepartTim ArrivalTim 243 8A Confirmed Window
r e e e 243 7D Reserved Aisle
243 9/24 9:00am 11:00am 243 14E Open Center
253 9/24 10:00am 12:30pm 253 1F Open Window
52 9/24 11:00am 2:00pm 253 43A Confirmed Window 45
Many to many rela2onships
Vehicle can be driven by many drivers; driver can drive many
vehicles
How can we get vehicle information for a driver from the database?

Independent Dependent Independent


Associa2ve table (en2ty), aka junc2on table
Primary key of parent is used in primary key of child

46
Many to many rela2onships

Vehicle Driver
VehicleI VehicleMak VehicleMod DriverI DriverNam DriverLicenseN
D e el D e br
35 Volvo Wagon 253 Ken A23423
33 Ford Sedan 900 Jen B89987
89 GMC Truck VehicleDriver
VehicleI DriverI
D D
35 900
35 253
89 900
Never create an entity with vehicle1, vehicle2, etc. as attributes!
47
Referen2al integrity
Referen2al integrity maintains the validity of foreign keys
when the primary key in the parent table changes
n Every foreign key either matches a primary key (or is null)
n For example: cannot add an employee to an invalid
department

Cascade rules: choose among delete op2ons


n Cascade restrict: Rows in the primary key table cant be
deleted unless all corresponding rows in the foreign key
tables have been deleted
n Cascade delete: When rows in the primary key table are
deleted, associated rows in foreign key tables are also
deleted

48
Key points from lesson
Primary keys are aWributes used to uniquely iden2fy a record

Foreign keys are aWributes stored in a dependent en2ty which


show how records in the dependent en2ty are related to an
independent en2ty

Data model consists of:


n En22es and aWributes
n Primary keys
n Foreign keys
n Rela2onships and cardinality
n Referen2al integrity and cascade rules
49
edX example

50
Solu2ons: Primary and foreign keys
Were getting there with this ERD: weve defined entities, attributes,
relationships and keys

51
Introduc2on to edX data modeling
exercise
We have all of the user data from the rst year of running edX

Download the dataset resource and inspect it


n 10 percent of the data has been randomly selected for use

Flat le: a table with all aWributes and records from which we
will design the database using a rela2onal data model

What are the most appropriate en22es? What concepts


should be represented in the database table?

52
Exercise mo2va2on and approach
Why is it useful to do this exercise?
n Scenario: Customer or client hands you data for analysis in
one or more giant Excel sheet
n Must understand business rules underpinning the dataset

Look at the single-table dataset


How could data structure be improved with a rela2onal
database?
n Iden2fy major en22es

n Iden2fy aWributes of those en22es

n Iden2fy rela2onships between en22es

53
edX Dataset File (from website)

54
Data dic2onary
Column Name Descrip2on
course_id three-part iden2er for a course
Course_Short_Title Short 2tle for the course
Course_Long_Title Long 2tle for the course
userid_DI Individual user ID
registered Whether the user is registered (1/0)
viewed Whether the user has viewed the contents (1/0)
explored Whether the user has explored the course (1/0)
cer2ed Whether the user is cer2ed (1/0)
Country Users country of origin
LoE_DI Users level of educa2on
YoB Users year of birth
Age Users age
gender Users gender
grade Users grade in the course
nevents Number of events the user has done on the site
ndays_act Number of ac2ons taken by the user
nplay_video Number of video plays done by the user
nchapters Number of chapters read by the user
nforum_posts Number of forum posts made by the user
roles Any roles the user has
incomplete_ag Whether the user has an incomplete for the course
55
Most obvious en22es and keys
En2ty: Users
n Primary key: userid_DI

En2ty: Courses
n Primary key: course_id

56
Primary key for courses
course_id: MITx/6.00x/2013_Spring

Split one aWribute into three aWributes for ease of querying:


n Ins2tu2on: MITx
n Course_number: 6.00x
n Course_term: 2013_Spring

Updated data dic2onary:


n Ins2tu2on: organiza2on responsible for the course
n Course_number: numbers and leWers iden2fying the
course
n Course_term: season and year of course session
57
edX en2ty-rela2onship diagram

58
Key points from lesson
Selec2on of en22es and associated aWributes from a at le is
not always obvious

The data modeling process may reveal inconsistencies or


errors in the data which will have to be corrected before
impor2ng into a database

Foreign keys can be used as primary keys in a dependent


en2ty if the keys uniquely iden2fy records in the dependent
en2ty

59

Potrebbero piacerti anche