Sei sulla pagina 1di 25

CSC271 Database Systems

Introduction

1
Course Overview
Topics: Relational Model, Relational Algebra, SQL, Joins,
Aggregation, E-R Model, Normalization, Optimization
Books

Database Systems 4th/ed. by Connolly (2014)

Database System Concepts 6th/ed. by Silberschatz (2010)

Evaluation (Theory) 75%

Quizzes: 15% Assignments: 10%

Sessional 1& 2: 25% Final Exam: 50%

Evaluation (Lab) 25%

Lab Tasks: 25%

Sessional 1& 2: 25% Final Exam (Project): 50%


2
Unstructured Data
In its latest avatar, the average ODI has witnessed a run rate of 5.33,
which is almost exactly the same as the rate when the earlier rules were
in force between October 30, 2012 and July 4, 2015. The scoring rate
increased dramatically in the first six months of 2015, rising to 5.67 from
5.29 in 2014. Those six months included the 2015 World Cup, where the
scoring rate was 5.65, and the New Zealand tour to England, where the
rate across five matches was an astronomical 7.15. Between October 30,
2012 and December 2014, the overall run rate was only 5.25
Since July 2015, the run rate for teams batting first in the last 15 overs
has dropped to 6.95 from 7.81 in the earlier period. The balls per
boundary has gone up from 6.7 to 8.7, a change of nearly 30%.
Source: http://www.espncricinfo.com/magazine/content/story/1052981.html

What is the current rate of balls per boundary?

Unstructured data is dicult to comprehend


3
From Data to Informa2on
Facts/numbers processed to increase knowledge
in the person using it

Player Country Mat Inns NO Runs HS Ave 100 50


SR Tendulkar India 159 261 27 12,773 248* 54.58 42 53
BC Lara WI 131 232 6 11,953 400* 52.88 34 48
RT PonNng Aus 136 229 26 11,345 257 55.88 38 48
AR Border Aus 156 265 44 11,174 205 50.56 27 63
SR Waugh Aus 168 260 46 10,927 200 51.06 32 50
R Dravid India 134 233 27 10,823 270 52.53 26 57
JH Kallis SA 131 221 33 10,277 189* 54.66 31 51
SM Gavaskar India 125 214 16 10,122 236* 51.12 34 45
GA Gooch Eng 118 215 6 8,900 333 42.58 20 46

4
Summarized Data

Graphical displays turn data into useful informa2on that


managers can use for decision making and interpreta2on
5
Database versus DBMS

Database Management System (DBMS)


A program for ecient, reliable, convenient and safe mulN-
user storage and retrieval of massive amounts of
structured data.
!SQL Server, Oracle Database, MySQL Community Server

Database
Repository of the informaNon/data managed by a DBMS
A usually large collec-on of data organized especially for
rapid search and retrieval.
Lets design a database for courses, students and enrollments

6
Database Schema

Schema: describes the layout (properties) of the data,,


including what kinds of fields are present and how they
are organized, data types, field sizes, and allowable
values.

7
Data Models and DBMS

Data Model is a conceptual framework for


describing data
Relational Data Model
Oracle, MySQL, SQL Server, PostgreSQL, DB2
Key-Value Stores
Redis, Memcached, RiakKV, Ehcache
Document Stores
MongoDB, Couchbase, Amazon DynamoDB,
CouchDB
Graph
Neo4J,
OrientDB, Titan, Virtuoso, AllegroGraph,
GraphDB
8 http://db-engines.com/en/ranking
Database Applica2ons
Banking: all transacNons
Airlines: reservaNons, schedules
UniversiNes: registraNon, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized
recommendaNons
Manufacturing: producNon, inventory, orders, supply
chain
Human resources: employee records, salaries, tax
deducNons
9
Case Study Qatar Petroleum
QP had accumulated a tremendous amount
of well-log data in its 50 years of operation.
Problems in manual indexing
Lost data or difficulty in accessing data
Inaccuracy of the stored logs caused by
lack of validation
Solution: Web user interface on top of latest
database integration (Schlumberger
InfoStream)
Results: Interpretation cycle reduced to 30%
to 35%

10
Check point

Database vs. DBMS


Data Model vs. Schema
Data vs. Information

Reference Book

11
History of Database Systems

1950s and early 1960s:


Data processing using magnetic tapes for storage
!Tapes provide only sequential access
Punched cards for input
Late 1960s and 1970s:
Hard disks allow direct access to data
Network and hierarchical data models in widespread use
Ted Codd defines the relational data model
!Would win the ACM Turing Award for this work
!IBM Research begins System R prototype
!UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
12
History of Database Systems (cont.)
1980s:
Commercial relational database systems
! SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and RDF standards
2010
NoSQL database systems (graph, key-value, documents)
Distributed in-memory stores, big-data research, stream processing
13
Trends in DB Systems

Object-relaNonal databases
Main memory database systems
Graph stores (social networks) NoSQL
Triple stores Not Only SQL
Document stores (JSON)
Key-value stores (Logs, structured data)
Stream data management

14
Database Application Architecture

Two or three-tier architecture

View (User Interface)


Client-side

Controller (Logic, ApplicaNon Programs)

Model (Data)
Server-side

15
Components of DB Environment Database
Expert
Administrator End Users
Users

Query AdministraNon ApplicaNon


Tools Tools Programs

Compiler and Query EvaluaNon


Linker Engine Query Processor

TransacNon Buer File


Manager Manager Manager
AuthorizaNon and Integrity
Manager Storage Manager

Disk Storage

16
What if we change the storage
medium?

17
Data Independence and Abstraction Levels

Refers to the immunity of user applications to changes


made in the definition and organization of data.
View data independence: application programs can hide
details of the data (such as salaries) for certain users.
Logical data independence: The ability to change the
logical (conceptual) schema, e.g. addition or removal of new
attributes, without changing the user view (aka external
schema).
Physical data independence: The ability to change the
physical schema, e.g. using different storage device, without
changing the logical schema or external schema.

18
File Processing vs. DBMS (1)

Data redundancy and inconsistency


Multiple files & formats, duplication of information in different
files
Difficulty in accessing data
Need to write a new program to carry out each new task
Lengthy development time
Data isolation and program-data dependence
Integrity problems
Integrity constraints (e.g. score > 0) become buried in
program code rather than being stated explicitly
Hard to add new constraints or change existing ones

19
File Processing vs. DBMS (2)

Atomicity of updates: Failures may leave database in an


inconsistent state with partial updates carried out.
Example: Transfer of funds from one account to
another should either complete or not happen at all
Concurrent access by multiple users or needed for
performance
Uncontrolledconcurrent accesses can lead to
inconsistencies
Example: Two people reading a balance and updating
it at the same time
Security problems
Hard to provide user access to some, but not all, data
20
File Processing vs. DBMS (3)
Each applicaNon programmer must maintain his/her
own data leading to non-standard le formats
Each applicaNon program must have its own
processing rouNnes for reading, inserNng, updaNng,
and deleNng data
Each applicaNon program needs to include code for the
schema of each le
Lack of coordinaNon and central control

21
Checkpoint

Levels of abstraction?
User roles?
Query engine vs. storage manager?
Codd and his work?

22
Summary

DBMS used to maintain and query large datasets.


Benets include recovery from system crashes,
concurrent access, quick applicaNon
development, data integrity and security.
Levels of abstracNon give data independence.
A DBMS typically has a layered architecture.

23
How large are Googles database?

Question of the Day

24
http://goo.gl/forms/YFpCkcva8S

25

Potrebbero piacerti anche