Sei sulla pagina 1di 12

CS240A: Databases and Knowledge Bases

Introduction
Carlo Zaniolo
Department of Computer Science
University of California, Los Angeles

Database Systems

During late 60s

Relational DBMS were proposed [by E.F. Codd] in the 70s


10+years of R&D led to Relational DBMSs and SQL

IMS and other hierarchical DBMSs


Codasyl-compliant DBMSs using the network model

Extraordinary success from a research and a commercial view


point (IBM, Oracle, )
Relational DBMS were covered in CS143

But starting in the mid 80s, DBMSs have faced major


technical and commercial challenges, forcing a major
evolution in these systems---this is the topic of CS240A!

DBMS Vendors
IBM.

SystemR, DB2

Oracle
MS

SQL Server

Smaller

Players:

Sybase,

Informix, Teradata/NCR

Changes and Challenges and

Expert Systems and rule-based computing and knowledge


management:

New Applications and data types (e.g., spatio- temporal and


multimedia information)

Object Oriented databases


Datablades and extenders

The WEB and XML

Deductive Databases and recursive queries


Active databases and rules,

Publishing databases using XML


XQuery: the new query language for XML data.

Decision Support, Knowledge Discovery, Big Data, Machine


Learning, , Data Science

OLAP applications
Data Mining

Evolution of SQL Standards

SQL89 and SQL2 (a.k.a. SQL92): Strictly relational.


SQL3: working documents discussing new specs for

OR systems, but also for


recursion,
active rules,
OLAPs and OLAP functions.

SQL:1999, and with minor changes SQL:2003.


But evolution continues:

User-defined indexes,
user-defined aggregates,
XML, etc.

In this course we investigate how SQL and relational systems


are being extended to face the new applications. We will
often study languages other than SQL as a framework for
research.

The main Problem of SQL:


Inadequate Expressive Power

For instance, SQL cannot support complex queries and


recursion needed in several applications, such as Billof
Materials applications.

Thus database applications are now developed in


procedural languages with embedded SQL statements

An impedance mismatch between SQL the host language


(different data types programming paradigm) slows down
application development and their execution.

Two approaches to solve the problem:

Making query language more powerful: deductive databases

Extending programming languages with DB capabilitiesthis is


approach taken by OO DBMSs and OR DBMSs

Expressive Power: Relational Completeness


All relational languages suffer from the same expressivepower problems:
1. Relational Algebra, 2. Domain Relational Calculus, 3. Tuple
Relational Calculus, and 4. Nonrecursive safe Datalog rules.

These languages are equivalent in terms of the expressive


power, and programs (I.e. queries) written in one language
are easily mapped into programs written in another.

The notion of Relational Completeness (RC) defines the class of


queries expressible using relational algebra or, equivalently, using
safe relational calculus queries.

RC was proposed in the 70s as a minimum required for all


database query languages (not met by most of query languages at
that time)

But nowadays RC is not enough!

Datalog

SQLs Close Relations


1. QBE (Query by Example): twodimensional rendering of domain
calculus
2. QUEL and SQL: inline, keywordbased versions of tuple relational
calculus---with extensions such as updates and aggregates.
3. Datalog: ruleoriented, logicbased refinement of domain calculus.

Datalog is the best candidate for more powerful query


languages because

Its formal framework based on first order logic,

It supports the rulebased programming paradigm, that is the key of


expert systems and knowledgebased systems

Similar to Prolog which is more procedural.

Big Data have brought a renewed interest in Datalog.

The Bigger Picture


Assemblers,
Languages

Operating Systems (Early 60s )

and Compilers (Late 60s )

Information

Management Systems and Data Base


Management Systems (DBMS) (70s

GUIs

(80s )

Networks
the

(60s) and

WEB (90s) and beyond

Year

2000 and beyond big data analytics

2010

and so Datalogs renaissance.

Workplan and Grade Basis


---Grade

Basis for CS240A

Midterm

: 40%
Homework and Assignements: 10%
Final Projects and Reports 50% (XML 15%, DM
35%)

Take

home final Consists of two projects:

The

first project will be about supporting temporal


queries in XML and JSON.

The

second project will ask you to write decision


support queries in SQL and DeALS.

Potrebbero piacerti anche