Sei sulla pagina 1di 53

Infosys 222

Week 1 - Database Introduction


Data
What is data?
- Raw, unprocessed facts and figures
Data Vs. Information
- Data are the building blocks of information
- Information is data that is processed, organised, structured or presented in
a given context so as to make it useful
- Information is data with a specific meaning associated with it
- High quality information is key to decision making
- Data Processing here Information
Organizational Resources (Success depends on efficient use of resources)
- Money
- Human capital
- Technical know-how
o Skills, knowledge, experience, etc.
- Data
- Infrastructure
o Buildings, factories, equipment, etc.
Value of Data
- Corporate decisions depend on data
o Day-to-day transactions
o Customer data, surveys
o Historical data
Market prices/trends, effect of promotions, sales data/trends
What are the issues customers complain about the most?
Which promotions generated the most profit?
What is most effective time period for a promotion?
- Loss of data implies loss of money
Corporate Data Cycle

Example (Amazon.com)
- Tries to predict other items a customer may want to purchase based on
whats in their shopping cart and the purchasing behaviors of other
customers based on historical data influence buyer behaviour

Infosys 222
Why Data Matters?
- https://www.youtube.com/watch?v=f2Kji24833Y
Essential features of Information
- Timely
- Accurate
- Complete
Exercise
- Example of using Data/information for decision making in a specific Industry

Database and Database Management System (DBMS)


What is database?
- Structured collection data/information
- Example: Searching Database
Database characteristics
- Database (digital/software) is typically a shared, integrated computer
structure housing
o End-user data

Metadata

What is Database Management System (DBMS)?


- Special software designed to help manage the database
- Database system can be thought of as the database + a DBMS
- We will use these terms interchangeably and we will only consider
computer-based relational databases
DBMS features
o Examples: SQL server, Oracle, Teradata, MySQL, Access Sqlite, etc.
- Data storage Management
o Structure (tables, columns, etc.)
o Data integrity management
- Security management
- Multi-user access control
- Backup and recovery management
- Database language and application programming interfaces
o Query language (SQL)
- Database communication interfaces
Importance of DBMS

Infosys 222
-

Makes data management more efficient and effective


Query language allows support for data-retrieval, data-manipulation,
structured reporting, and quick answers to ad hoc (non-structured/one-off)
queries
- Provides better access to more and better-managed data
- Reduces the probability of inconsistent data
- Promotes integrated view of organisations operations
What happens when database management is not used?
- Files used to manage financial data
- No database systems in place
Real life examples
- Google data center
- TimesCast Retailors Predictions

Infosys 222

Week 2 - Relational Data Model


Relational Database
-

All general purpose DBMS are based on the relational data model. This
means that all data is stored in a number of tables (with named columns)
For historical, mathematical reasons such tables are referred to as
relations
The tables show data together with relationships between the data
Enables users to view data logically as two-dimensional structure composed
of rows and columns
This course is solely on relational database, and on relational DBMS

Relational Database Management System (RDBMS)


-

Looks like a collection of tables

Relational Database
- A precise, conceptual way of describing the data stored in a relational
database
o Structure of the data
o Operations on the data
o Constraints on the data
Relation (Table)
- Stores data on individual things, which are considered important
o People (Employee, Student, Staff Member)
o Objects (Book, Product, Lecture Room)
o Concepts / Actions (Transaction in an ATM, Borrowing a book from the
library)
Structure of the data Relation (Table)
- A relation consists of rows and columns
- The column header will describe the data
- The number of columns are fixed - definite number
- The number of rows are not fixed indefinite number
- Each intersection between a row and column (cell) contains a single item
of data
- Each row will describe a single instance of the data
Example Relation
- Book

Infosys 222

o Each row should describe a single book using the column headers
o This makes each row a record providing information about a single book
- Student
o (possible columns header for storing a students identity information)
Exercise

Tuple (Row)
- A relation consists of tuples (Rows)
- A tuple is an orders list of values
- Tuples are usually written in parentheses, with commas separating the
values (or components)
o Example: Employee Relation (7369, SMITH, M, Technician)
- Order is significant
o Example: the tuple (7369, Technician, SMITH, M,) is different from the
tuple above
Attribute (Column/Field)
- In order to be able to refer to the different components in a tuple, we will
assign them names (called attributes)
o Example: For the tuple (7369, SMITH, Male, Technician), we might
choose the attributes ID, Name, Gender, and JobDescription
Data Type
- The value of an attribute belongs to a domain; also known as a data type
of an attribute
- All attributes must have a data type, but the data types available depend on
the particular DBMS
- Commonly available data type among different implementations
o TEXT for text strings
o INTEGER for integers
o REAL for real numbers
o DATE for dates
Schema
- In the relational data model, a relation is often described using a schema
which consists of

Infosys 222
o The names of the relation
o The set of its attributes (sometimes with data types)
- Example: The relation Employee can be described by the schema
o Employee (ID, Name, Gender, JobDescription)
o Employee(ID INGEGER, Name TEXT, Gender TEXT, JobDescription TEXT)
- The schemas of all relations in a database form a database schema
Relation Instance
- A relation is not static; it changes over time
o Inserting new tuples
o Updating components of existing tuples
o Deleting tuples
- A set of tuples for a relation at a moment is an instance of that relation
- A DBMS maintains the current instance
Key
- An attribute or a set of attributes used to uniquely identify a tuple
o Two employees will not have the same ID
- This unique attribute/ attributes is called the Primary Key
- You can introduce an artificial key, if no suitable attribute/attributes exist

Week 2 ERD Part 1


Database Design Entity Relationship Diagrams
Database Design
- Create a blueprint
- Need to consider
o What tables, attributes and keys needed?
o What is the database going to be used for?
- Conceptual design
o Build a model independent of the choice of DBMS
- Logical
o Create the database in a given DBMS
- Physical design
o How the database is stored in hardware
Entity Relationship (ER) model

Infosys 222
-

The concept is originally defined by Chen (1976), which has been adopted
and refined by practitioners as the leading method to carry out database
design
- An ER model is a systematic way of describing and defining a business
process. The process is modelled as entity sets that are linked with each
other by relationships that express the dependencies and requirements
between them
- An ER diagram (ERD) is used as a tool for ER modelling, which also provides
a representation of the ER model
ER Modelling and the Relational Data Model
- Entity set -> relation
- Attributes -> attributes
- Relationships -> The connections between the relations
Entity/Relationship Modelling
- ER Modelling is used for conceptual design
o Entity Set: objects or items of interest
o Attributes: facts about, or properties of, an entity. They describe an entity
o Relationships: links between entities
Entity Set / Entities
- Entity set represents object or things of interest
- A general type
o Physical things like students, lecturers, employees, products
o More abstract things like transactions, orders, courses, projects
- Instances of that particular type, are entities
- An entity set should be named with a singular noun
o Related to business characteristics, meaningful and self-documenting
o Unique and concise, readable

- Entity Set -> Relation / Table


- Entity -> Row / Tuple / Record
Entity Attributes belonging to it

Attributes are facts, aspects, properties, or details about an entity


o i.e. students have IDs, names, courses, addresses, .

Infosys 222
- One or more attributes define the key
Attributes
- Characteristics of entities
- Domain is set of possible values (defined by data type)
- Represented as columns in a database
- Design note
o Name descriptively and meaningfully
o Naming convention camel casing (the first letter of the first word is
lowercase, but subsequent first letters are uppercase)

Types of Attributes
1. Primary Key
2. Simple (Single-valued)
o Cannot be subdivided
Gender, marital status
3. Composite
o Is composed of several component parts
Address: streetNumber, surburb, city, zip code
Name: firstName, lastName
o To model Operational decision reduce redundancy and
inconsistencies, ease of retrieval, usage
Create additional attributes for an entity viable option
Create an entirely new entity Needs Relationships TBC
4. Multi-valued
o Multiple values possible
o Customer entity with a phone attribute
homePhone
officePhone
o facultyMember with a qualification attribute
BSc
MSc
PhD

Infosys 222
To model Operational decision reduce redundancy and
inconsistencies, ease of retrieval, usage
Create additional attributes for an entity not the best option
Create an entirely new entity Needs Relationships TBC
5. Derived
- Values that are calculated from other attributes
o Age: calculated from dateOfBirth
o Ordertotal -> calculated from unitPrice x quantity
- To model
o Normally not stored
o Operational decision resource use, usage of data
Decisions How to model attributes
- How would the data be used
- Take future growth into consideration
- Operational efficiency
o Eliminate inconsistencies
o Reduce redundancy
o

Infosys 222

Primary Key
- The primary key is an attribute or a set of attributes that uniquely identify a
specific instance of an entity
- Every entity in the data model must have a primary key whose values
uniquely identify instances of the entity
- To qualify as a primary key for an entity
o It must have a non-null value for each instances of the entity
o The value must be unique for each instance of an entity
o The values must not change or become null during the life of each entity
instance
Candidate Key
- In some instances, an entity will have more than one attribute that can
serve as a primary key
- Any key or minimum set of keys that could be a primary key us called a
candidate key
- Once candidate keys are identified, choose one, and only one, primary key
for each entity
- Candidate keys which are not chosen as the primary key are known as
alternate keys
- If none of the candidate keys are suitable introduce
- Example
o Publisher Entity From the case description
Publisher Name
Publisher Phone number
May change over time
o Author Entity From the case description
Author Name

Infosys 222

May not be unique / May change over time


o Book Entity Form the case description
ISBN number unique, no change over time, not null
Title
The Entities and Attributes

Relationships
- Relationships are an association between two or more entities
o Case description
o Boos can be written by one or more authors
o Authors can also write more than one book
o Publishers publish many books
o One book is published by one publisher
- Relationships have
o A name verb
o A set of entities that participate in them
o Operate in both directions
o A cardinality ratio
o A degree the number of entity sets that participate (most have degree
2)

Infosys 222

Cardinality Ratios
- Each entity in a relationship can participate in zero, one, or more than one
instances of that relationship
- This leds to 3 types of relationship
- Multiplicity
o One to many (1:M)
o Many to many (M:M)
o One to one (1:1)
- Optionality
o Optional or mandatory
One To Many Relationship

Infosys 222
-

Indicates that a single occurrence of one entity is associated with one or


more occurrences of the related entity
o B1->P1
o P1->B1, B2
o A book must have a publisher
o The store may not be currently having any books published by p%
- To create
o Place primary key of parent as Foreign key of child
Cardinality Multiplicity and Optionality

Foreign Key

A foreign key is way for navigating between different instances of an entity


Appears on the many side of a 1:M

Infosys 222

Many To Many Relationship

Infosys 222

An entity of either set can be connected to many entities of the other set
o B1 -> A1, A3
o A1 -> B1, B2, B3
In the initial model

In the data Book

Infosys 222

In the data Author

Removing M:M Relationships


- Many to many relationships are diffeicult to represent
- We can split a many to many relationship into twon one to many
relationships
- The new entity Associative entity represents the M:M relationship
- An associative entity set is used to represent a relationship which often has
its own attributes that do not belong to other entity sets
Resolve In the model

Infosys 222

In the data BookAuthor

Primary key?
Sometimes more than one attribute is required to uniquely identify an entity
A primary key is made up of more than one attribute is known as a
composite key

Infosys 222

One to One Relationship

Each entity of either entity set is related to at most one entity of the other
set
o E1 -> A1
o E2 -> A3
o E3 -> A4
- An author has one and only one address
- Address -> attribute of employee
Attribute vs Entity
- If we have several addresses per Author, (Home Address and Studio
Address)
o Address must be an entity attributes cannot be multi valued

Infosys 222

Entities and Attributes


- Sometimes it is hard to tell if something should be an entity or an attribute
o They both represent objects or facts about the world
o They are both often represented by nouns in descriptions
- General guidelines
o Entities can have attributes but attributes have no smaller parts
o Entities can have relationships between them, but an attribute belongs to
a single entity
Making E/R Models
- To make an E/R model you need to identify from the description
o Entities
o Attributes
o Relationships
o Cardinality ratios
- General guidelines
o Since entities are things or objects they are often nouns in the description
o Attributes are facts or properties, and so are often nouns also
o Verbs often describe relationships between entities

Infosys 222

Reading
- ER modelling with crows foot notation
Summary you should
- Be able to arrive at a logical ER model based on a case description
- Now about some key ERD concepts: entities, attributes, keys and
relationships

Infosys 222

Degree of a relationship
- Is the number of entity sets that participate in a relationship
- The three common relationship degrees
1. Unary (degree 1)
2. Binary (degree 2)
3. Ternary (degree 3)
- Higher degree relationships are possible but rarely encountered in practice
Binary Relationships
- Between the instances of two entity sets
- The most common type of relationship encountered in data modelling

Unary Relationship
- Between the instances of a single entity set (recursive relationships)
- Cardinality could be 1:1, 1:M or M:N

Infosys 222
o
o
o

E.g. each person is married to just one person -> 1:1


Model as an attribute within the same entity
Dont draw

Cardinality could be 1:M


o E.g. Each employee can manage many other employees or no one / but
each employee is managed by only one other employee or not managed
by any one at all

Cardinality could be M:N


o E.g. Course - INFOSYS222
Prerequisite: INFOSYS110 or 120, or COMPSCI 105 or 107
Courses - INFOSYS330, INFOMGMT393
Prerequisite: INFOSYS222
o Each course needs many prerequisite courses / each course is a
prerequirement for many other courses
o To model this accurately,

Infosys 222

Ternary Relationships
- Simultaneous relationship among the instances of 3 entity sets
- E.g. Employees with many required skills can be assigned to many projects
o One employee has many skills and is assigned to many projects
o One project includeds many employees with many required skills
o One skill can be possessed by many Employees working in many projects
o THREE M;N relationships

It is recommened that all ternary (or higher) relationships are converted


into associatve entities
Represent the ternary relationship with an associative entity and three
binary relationships

Infosys 222

Surrogate Keys
- Can be substitute single value surrogate keys for large composite keys

Note: the relationship changes to non-identifying

Infosys 222

Infosys 222

Infosys 222

Weak Entity
- An entity is considered weak if the existence (of an instance) of that entity
depends on the existence (of an instance) of another entity
- A weak entity can be identified uniquely only by considering the primary key
of another (owner) entity

Infosys 222
-

Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities

Infosys 222

The PK of the parent entity must be part of the PK of the weak


Weak entities are otherwise just like regular entities (name, PKs, FKs,
attributes, related to other entities, etc.)
- The relationship between a strong entity set and a weak entity set is called
an identifying / supporting relationship
Identifying and Non Identifying Relationship
- Identifying
o A child object (weak entity set) cannot exist without the parent object
and child object cannot be uniquely identified without the parent
o If the parent entity is deleted, then the child entity must be deleted
o Identifying relationships exist when the primary key of the parent entity
is included in the primary key of the child entity

Non Identifying
o A non- identifying relationship means that a child entity is related to
parent entity but it can be identified independently of the parent entity
o The child item should be kept even though the parent is deleted

Infosys 222
o

A non-identifying relationship is when the primary key attributes of the


parent entity must not become primary key attributes of the child entity

Generalization
- Process of defining general entity types from a set of specialised entity
types by identifying their common characteristics

Superset and Subset


- The entity HourlyEmployee is included in the entity set Employee
o All entities of one are also entities of another
- HourlyEmployee is the subset (or subtype / subclass)
- Employee is the superset (or supertype / superclass)
- Sub-types (instances) may be mutually inclusive or exclusive

Superset: a generic entity set that has a 1:1 relationship with one or more
subsets

Infosys 222
-

Subset: a subgrouping of the entities in an entity set that has distinct


attributes
Inclusive (Overlap)
- Define whether it is possible for an instance of a superclass to
simultaneously be a member of one or more subclasses
- A superclass instance can overlap more than one subclass
o A person can be both a student and staff

o
Exclusive (Disjoint)
- States that if an instance of a superset is a member of any subset, then it
cannot be a member of more than one subset
o A student is either a Graduate or PostGraduate, not both
o

Infosys 222

Discriminators
- A discriminator is an (optinal) attribute that determines which subtype is
appropriate
- Example: The attribute isGradStudent, which appears in STUDENT on the
prior slide is a discriminator
o Will have a domain of Yes and No
Superset and Subset Identifiers and Inheritance
- The identifier of the super type and all of its subtypes must be identical
- The identifier of the super type becomes the identifier of the related
subtype(s)
- Rename if required
- Inheritance means that the entities in the subtypes inherit the attributes
of the supertype entity class
- Example: Graduate inherits the attributes of Person and Student

Infosys 222

Infosys 222

Design Principles for Data Modelling


- There are no right or wrong data model. Good data modelling is difficult
- Useful design principles
o Be faithful to the specification of the requirement
o Use common sense and make assumption only if the specification fails to
explain
o Avoid duplication and other redundant information

Infosys 222

Infosys 222

Reading: http://www.inf.unibz.it/~franconi/teaching/2000/ct481/er-modelling/
You should be able to
- Arrive at a logical ER model based on a case description
- Apply the ERD concepts to a database design task
Database Journey
- Conceptual Model
o Entity, Attributes, Relationship (1 to M, M to M and 1 to 1)

Infosys 222
-

Logical Model
o PK, FK, Associative Entity, Weak and Strong Entity,
Generalization/Specialization (Exclusive and Inclusive), Unary
Relationship .
- Physical Model
o Table, Column, Data

Infosys 222

Normalisation
- A process of organizing the fields and tables of a relational database to
minimize redundancy and dependency
- It is a theoretical technique to refine and improve (or even to begin) the
logical data modelling
- The idea is that an entity set (table) should be about a specific topic and
that only those attributes (columns) which support that topic are included

Infosys 222

Data Duplication
- Increases storage and decreases performance

Data Modification Issues


1. Insert Anomaly

2. Update Anomaly

Infosys 222

3. Delete Anomaly

Infosys 222

Infosys 222

In summary
- Having one entity that serves many purpses introduces many challenges
o Data Duplication
o Data Update Issues
- Need Normalisation
o To minimize duplicate data
o To minimize or avoid data modification issues
Steps of Normalization
- First Normal Form (1NF)
o To remove all multivalued attributes and to define a primary key for a
given data structure
- Second Normal Form (2NF)
o To remove all parial functional dependedcies that exist between a non-key
attribute and part of a primary key for a given data structure with a
composite key
- Third Normal Form (3NF)
o To remove all transitive funcional dependecies that exist between a
nonkey attribute with another non-key attribute for a given data structure

Infosys 222

First Normal Form (1NF)


- There should be a single fact in each row
- All the primary key attributes are defined
- All attributes are functionally dependent on the primary key (in part or in
whole)
1NF Steps
1. Un Normalized form

2. Find a good PK
o Find the repeating group

Infosys 222

3. Make the obivous identifier of the set and the identifier of the repeating
froup a composite primary key
o Obivous identifier Order ID
o Identifier of the repeating group Product ID
o Primary Key Order ID, Product ID
1NF
Order(orderID, orderDate, customerID, customerName,
customerAddress, productID, productDesc, productFinish, unitPrice,
orderedQuantity)
Second Normal Form (2NF)
- Remove all Partial Funcional dependencies
Functional Dependencies
- We say an attribute, B, has a functional dependency on antoher attribute, A,
if for any two records, which have the same value for A, then the values for
B in these two records must be the same
- We illustriate this as: A -> B

Infosys 222

Partial Dependency
- When an attribute B is functinally dependent on an attribute A, and A is a
component of a multipart candidate key

Second Normal Form (2NF)


- 1NF
Order(orderID, orderDate, customerID, customerName, customerAddress,
productID, productDesc, productFinish, unitPrice, orderedQuantity)
1. Examine all non key attributes
2. Remove all Partial Functional dependencies to separate relations
3. If any attributes are functionally dependent on the complete composite
key, include them in a separate relation, indicating foreign keys

Infosys 222

Third Normal Form (3NF)


- Remove all Transitive Functional Dependencies
Transitive Dependency
- Consider attributes A, B, and C, and where A->B and B->C
- Functional dependencies are transitive, which means that we also have the
functional dependency A->C

Infosys 222
-

We say that C is transitvely depednet on A through B

Infosys 222

Infosys 222

Infosys 222

Notes on Normalization
- If the given data structure in 1NF has a single attribute primary key, then
there is no partial functional dependencies and hence the 1NF is in 2NF
- If the given data structure in 2NF has no transitive functional dependencies
then it is in 3NF
- Derived attributes are not included
The Database Oath
- Every non-key attribtue must provide a fact about the key, the whole key,
and nothing but the key, so help me Codd
o The key refers to 1NF
o The whole key refers to 2NF
o Nothing but the key refers to 3NF
Common Considerations
- Derived attributes
o For the purpose of improving the performance of certain queries, it could
be argued to store selected derived attributes to aviod the ad-hoc
computation among large volume of data
- 1:1 relationship to decompose entity set
o If there are reasons beyond data modelling to physically separate some
attributes from the same entity set into multiple ones (e.g. security), the
physical data model should reflect that
- Denormalisation

Infosys 222
It is not uncommon to reverse the process of normalisation to induce
redundancy and eliminate the number of eneity sets for the purpose of
performane and maintenance of the database
Note: all these considerations must not be taken lightly without reasoning
and weighting between benefits and costs

Infosys 222

Infosys 222

Summary:
You should be able to apply the normalisation concepts to a database design
task

Potrebbero piacerti anche