Sei sulla pagina 1di 12

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES


COURSE HANDOUT

Part A: Content Design

Course Title Distributed Data Systems


Course No(s) SS ZG554
Credit Units 5
Course Author
Version No
Date

Course Objectives
No Objective

This field covers all aspects of computing and information access across multiple
CO1 processing elements connected by any form of communication network, either local area,
or wide area

There has been a steady growth in the development of contemporary applications that
CO2 demonstrate their efficacy by connecting millions of users/applications/machines across
the globe without relying on a traditional client-server approach.

The general computing trend is to leverage shared resources and massive amounts of data
CO3 over the Internet. This course aims to provide an understanding of theory and systems
aspects of distributed data

Text Book(s)
M. Tamer Özsu • Patrick Valduriez Principles of Distributed Database Systems Third
T1
Edition
T2 Distributed Operating Systems: Concepts And Design By Pradeep K. Sinha

Reference Book(s)

“Storage Networks Explained” – by Ulf Troppens, Wolfgang Muller-Freidt, Rainer Wolafka,


R1
IBM Storage Software Development, Germany. Publishers: Wiley
On-Line Resources

HBase
https://hbase.apache.org
http://www.tutorialspoint.com/hbase/

MapReduce
https://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/
http://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

SAN
http://searchstorage.techtarget.com/definition/storage-area-network-SAN
http://www.snia.org/education/storage_networking_primer/san/what_san

NAS
http://searchstorage.techtarget.com/definition/network-attached-storage
http://www.webopedia.com/TERM/N/network-attached_storage.html

Content Structure

1. Distributed Data Storage Technology


a. Server-centric IT architecture and its limitations
b. Storage-centric IT architecture and its advantages
c. Architecture of intelligent disk subsystems
d. Hard disks and internal i/o channels and JBOD
e. Storage virtualization using RAID
f. Introduction to NAS, SAN and DAS
2. Distributed File Systems & Security
a. File Models & Accessing models
b. File sharing Semantics
c. File Caching
d. File Replication
e. Fault Tolerance
f. File System Security
3. Distributed Databases
a. Distributed DBMS
b. Architectural Models for DDBS
c. Distributed DBMS Architecture
d. Distributed Data Sources
4. Distributed Database Design Issues & Integration
a. Framework of Distribution
b. Distributed Design Issues
c. Top-Down Design Process
d. Fragmentation
e. Allocation
f. Bottom-Up Design Methodology
g. Schema Matching
h. Schema Integration
i. Schema Mapping
j. Data Cleaning
5. Data and Access Control
a. Database Security
b. Discretionary Access Control
c. Multilevel Access Control
d. Distributed Access Control
e. View Management
f. Views in Centralized DBMSs
g. Views in Distributed DBMSs
h. Maintenance of Materialized Views
6. Data Replication
a. Consistency of Replicated Databases
b. Update Management Strategies
c. Replication Protocols
d. Replication and failures
e. Replication Mediator Service
7. Parallel Database Systems
a. Parallel Database System Architectures
b. Parallel Data Placement
c. Load Balancing
d. Database Clusters
8. Web Data Management
a. Web Graph Management
b. Web Search
c. Web Crawling
d. Indexing
e. Ranking and Link Analysis
f. Keyword Search
g. Web Querying
h. Semi-structured Data Approach
i. Web Query Language Approach
j. Question Answering
k. Searching and Querying the Hidden Web
9. Hadoop & Big Data
a. Introduction
b. Hadoop Architecture
c. HDFS Operations
d. HDFS Commands
e. Big Data Overview
f. Multi Node Cluster
g. Map Reduce

Learning Outcomes:

No Learning Outcomes

LO1 Understanding about Distributed structures

LO2 Understanding of Distributed Storage systems and the technologies used to implement

LO3 Understanding of Distributed databases architecture

LO4 Understanding of Parallel databases architecture and systems

LO5 Understanding Hadoop environment and Big Data


Part B: Contact Session Plan

Academic Term First Semester 2018-2019

Course Title Distributed Data Systems

Course No SS ZG554

Lead Instructor

Glossary of Terms
1. Contact Hour (CH) stands for a hour long live session with students conducted either in a physical
classroom or enabled through technology. In this model of instruction, instructor led sessions will be
for 22 CH.
a. Pre CH = Self Learning done prior to a given contact hour
b. During CH = Content to be discussed during the contact hour by the course instructor
c. Post CH = Self Learning done post the contact hour
2. Contact Hour (CS) stands for a two-hour long live session with students conducted either in a
physical classroom or enabled through technology. In this model of instruction, instructor led
sessions will be for 11 CS.
a. Pre CS = Self Learning done prior to a given contact session
b. During CS = Content to be discussed during the contact session by the course instructor
c. Post CS = Self Learning done post the contact session
3. RL stands for Recorded Lecture or Recorded Lesson. It is presented to the student through an online
portal. A given RL unfolds as a sequences of video segments interleaved with exercises
4. SS stands for Self-Study to be done as a study of relevant sections from textbooks and reference
books. It could also include study of external resources.
5. LE stands for Lab Exercises
6. HW stands for Home Work.
7. M stands for module. Module is a standalone quantum of designed content. A typical course is
delivered using a string of modules. M2 means module 2.

Teaching Methodology (Flipped Learning Model)


The pedagogy for this course is centered around flipped learning model in which the traditional class-room
instruction is replaced with recorded lectures to be watched at home as per the student’s convenience and the
erstwhile home-working or tutorials become the focus of classroom contact sessions. Students are expected to
finish the home works on time.

Contact Session Plan


o Each Module (M#) covers an independent topic and module may encompass more than one
Recorded Lecture (RL).
o Contact Sessions (2hrs each week) are scheduled alternate weeks after the student watches all
Recorded Lectures (RLs) of the specified Modules (listed below) during the previous week
o In the flipped learning model, Contact Sessions are meant for in-classroom discussions on cases,
tutorials/exercises or responding to student’s questions/clarification--- may encompass more than
one Module/RLs/CS topic.
o Contact Session topics listed in course structure (numbered CSx.y) may cover several RLs; and as
per the pace of instructor/students’ learning, the instructor may take up more than one CS topic
during each of the below sessions.

Detailed Structure
Introductory Video/Document: << Introducing the faculty, overview of the course, structure and
organization of topics, guidance for navigating the content, and expectations from students>>

 Each of the sub-modules of Recorded Lectures (RLx.y ) shall delivered via 30 – 60mins videos
followed by:
 Contact session (CSx.y) of 2Hr each for illustrating the concepts discussed in the videos with
exercises, tutorials and discussion on case-problems (wherever appropriate); contact sessions (CS)
may cover more than one recorded-lecture (RL) videos.

Course Contents
Contact Hour 1: Distributed Data Storage Technology

Time Type Description Reference

T1 – 1
RL 1.1 T1 - 2
Pre CH RL
RL 1.2 R1 – 1
R1 – 2

 SERVER-CENTRIC IT ARCHITECTURE AND


ITS LIMITATIONS
 STORAGE-CENTRIC IT ARCHITECTURE R1 – 1.1
AND ITS ADVANTAGES R1 – 1.2
During CH CH
 ARCHITECTURE OF INTELLIGENT DISK R1 – 2.1
SUBSYSTEMS R1 – 2.2
 HARD DISKS AND INTERNAL I/O
CHANNELS

Post CH SS Case Study: Replacing a Server with Storage Networks R1 – 1.3

Lab Reference

Contact Hour 2: Distributed Data Storage Technology


Time Type Description Reference

RL 1.3 T1 - 2
Pre CH RL
RL 1.4 R1 – 2

 JBOD: JUST A BUNCH OF DISKS


 STORAGE VIRTUALISATION USING RAID
 DIFFERENT RAID LEVELS
 RAID 0: BLOCK-BY-BLOCK STRIPING R1 – 2.3
During CH CH  RAID 1: BLOCK-BY-BLOCK MIRRORING R1 – 2.4
 RAID 0+1/RAID 10: STRIPING AND R1 – 2.5
MIRRORING COMBINED
 RAID 0+1: STRIPING AND MIRRORING
COMBINED
 RAID 10: STRIPING AND MIRRORING
COMBINED

Post CH SS R1 : Page 535 & 536 R1

Lab Reference

Contact Hour 3: Distributed Data Storage Technology


Time Type Description Reference

R1 – 2
Pre CH RL RL 1.5
RL - 1

 RAID 4 AND RAID 5


 RAID 6: DOUBLE PARITY
 RAID 2 R1 – 2.5.4
 RAID 3 R1 – 2.5.5
During CH CH
 COMPARISON OF THE RAID LEVELS R1 – 2.5.6
 BASIC FORMS OF STORAGE R1 – 2.5.7
 COMPARISON
 Introduction to NAS, SAN and DAS

Post CH SS Availability of Disk Subsystems R1 - 2

Lab Reference

Contact Hour 4: Distributed File Systems & Security


Time Type Description Reference

T2 - 9.1
Pre CH RL Features of Distributed File system
T2 - 9.2

 File Models & Accessing models T2 - 9.3


During CH CH
 File sharing Semantics T2 - 9.4

Post CH SS Design Principles T2 - 9.10

Lab Reference

Contact Hour 5: Distributed File Systems & Security


Time Type Description Reference

Pre CH RL - -

 File Caching T2 – 9.5


 File Replication T2 – 9.6
During CH CH
 Fault Tolerance T2 – 9.7
 File System Security T2 – 9.8

Post CH SS Case study T2 – 9.11

Lab Reference
Contact Hour 6: Distributed Databases
Time Type Description Reference

RL 2.1 T1 – 1
Pre CH RL
RL 2.2 RL - 2

T1 – 1.7.1
T1 – 1.7.2
 Distributed DBMS Systems T1 – 1.7.3
During CH CH
 Architectural Models for DDBS T1 – 1.7.4
T1 – 1.7.5
T1 – 1.7.6

Post CH SS - -

Lab Reference

Contact Hour 7: Distributed Databases


Time Type Description Reference

RL 2.3
T1 – 1
Pre CH RL RL 2.4
RL - 2
RL 2.5

T1 – 1.7.8
 Distributed DBMS Architecture
During CH CH T1 – 1.7.9
 Distributed Data Sources
T1 – 1.7.10

Post CH SS Distributed DBMS Architecture Online References

Lab Reference

Contact Hour 8: Distributed Database Design Issues & Integration


Time Type Description Reference

RL 3.1
RL 3.2
T1 – 3
Pre CH RL RL 3.3
RL - 3
RL 3.4
RL 3.5

 Framework of Distribution
T1 – 3.1
 Distributed Design Issues
T1 – 3.2
During CH CH  Top-Down Design Process
T1 – 3.3
 Fragmentation
T1 – 3.4
 Allocation

Post CH SS Solve T1 : Problem 3.1 & 3.2 T1 - Page 126

Lab Reference
Contact Hour 9: Distributed Database Design Issues & Integration
Time Type Description Reference

T1 – 4
Pre CH RL RL 3.6
RL - 3

 Bottom-Up Design Methodology T1 – 4.1


 Schema Matching T1 – 4.2
During CH CH  Schema Integration T1 – 4.3
 Schema Mapping T1 – 4.4
 Data Cleaning T1 – 4.5

Post CH SS Solve T1 : Problem 4.4 T1 - Page 161

Lab Reference

Contact Hour 10: Data and Access Control


Time Type Description Reference

RL 4.1
RL 4.2 T1 – 5
Pre CH RL
RL 4.3 RL - 4
RL 4.4

 Database Security
 Discretionary Access Control
During CH CH T1 – 5.2
 Multilevel Access Control
 Distributed Access Control

Post CH SS Case Studies T1 – 5

Lab Reference

Contact Hour 11: Data and Access Control


Time Type Description Reference

Pre CH RL - -

 View Management
 Views in Centralized DBMSs
During CH CH T1 – 5.1
 Views in Distributed DBMSs
 Maintenance of Materialized Views

Post CH SS Solve T1 : Problem 5.1 T1 – Page 202

Lab Reference

Contact Hour 12: Mid-Semester Review

Time Type Description Reference

Pre CH RL CH 1 to 11 -
During CH CH Mid-Semester Review CH 1 to 11

Post CH SS CH 1 to 11 -

Lab Reference

Contact Hour 13: Data Replication


Time Type Description Reference

RL 5.1
RL 5.2 T1 – 13
Pre CH RL
RL 5.3 RL – 5
RL 5.4

 Consistency of Replicated Databases T1 – 13.1


During CH CH
 Update Management Strategies T1 – 13.2

Post CH SS

Lab Reference

Contact Hour 14: Data Replication


Time Type Description Reference

RL 5.5 T1 – 13
Pre CH RL
RL 5.6 RL – 5

 Replication Protocols T1 – 13.3


During CH CH  Replication and failures T1 – 13.5
 Replication Mediator Service T1 – 13.6

Post CH SS Solve T1 : Problem 13.2 T1 - Page 493

Lab Reference

Contact Hour 15: Parallel Database Systems


Time Type Description Reference

RL 6.1
RL 6.2
T1 - 14
Pre CH RL RL 6.3
RL - 6
RL 6.4
RL 6.5

 Parallel Database System


During CH CH T1 – 14.1
 Architectures

Post CH SS Parallel Database Architectures T1 – 14.1

Lab Reference

Contact Hour 16: Parallel Database Systems


Time Type Description Reference
T1 - 14
Pre CH RL RL 6.6
RL - 6

 Parallel Data Placement T1 – 14.2


During CH CH  Load Balancing T1 – 14.4
 Database Clusters T1 – 14.5

Post CH SS Solve T1 : Problem 14.15 T1 - Page 550

Lab Reference

Contact Hour 17: Web Data Management


Time Type Description Reference

RL 7.1
RL 7.2 T1 - 17
Pre CH RL
RL 7.3 RL - 7
RL 7.4

 Web Graph Management T1 – 17.1


During CH CH
 Web Search

Post CH SS Understanding Web search Online References

Lab Reference

Contact Hour 18: Web Data Management


Time Type Description Reference

RL 7.5
T1 - 17
Pre CH RL RL 7.6
RL - 7
RL 7.7

 Web Crawling
 Indexing
During CH CH T1 – 17.2
 Ranking and Link Analysis
 Keyword Search

Post CH SS Indexing and Ranking case studies Online References

Lab Reference

Contact Hour 19: Web Data Management


Time Type Description Reference

Pre CH RL - -

 Web Querying
 Semi-structured Data Approach
During CH CH  Web Query Language Approach T1 – 17.3
 Question Answering
 Searching and Querying the Hidden Web

Post CH SS Solve T1 : Problem 17.1 T1 - Page 719


Lab Reference

Contact Hour 20: Hadoop & Big Data


Time Type Description Reference

RL 8.1
Pre CH RL RL 8.2 Online References
RL 8.3

 Hadoop & Big Data Introduction


 Hadoop Architecture
During CH CH Online References
 HDFS Operations
 HDFS Commands

Post CH SS HDFS Commands and Hadoop case studies Online References

Lab Reference

Contact Hour 21: Hadoop & Big Data


Time Type Description Reference

Pre CH RL RL 8.4 RL - 8

 Big Data Overview


During CH CH  Multi Node Cluster Online References
 Map Reduce

Post CH SS Big Data solutions Online References

Lab Reference

Contact Hour 22: Comprehensive Exam Review


Time Type Description Reference

Pre CH RL CH 1 TO 21 -

During CH CH Comprehensive Exam Review CH 1 TO 21

Post CH SS CH 1 TO 21 -

Lab Reference

Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duration Weight Day, Date, Session, Time
EC-1 Quiz-I/ Assignment-I Online 5% September 10 to 20, 2018
Quiz-II Online 5% October 20 to 30, 2018
Quiz-III/ Assignment-II Online 5% November 10 to 20, 2018
EC-2 Mid-Semester Test Closed 2 hours 35% 30/09/2018 (AN)
Book 2 PM – 4 PM
EC-3 Comprehensive Exam Open Book 3 hours 50% 25/11/2018 (AN)
2 PM – 5 PM
Note - Evaluation components can be tailored depending on the proposed model.

Important Information:
Syllabus for Mid-Semester Test (Closed Book): Topics in CS 1-5.
Syllabus for Comprehensive Exam (Open Book): All topics given in plan of study

Evaluation Guidelines:

1. For Closed Book tests: No books or reference material of any kind will be permitted.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
2. For Open Book exams: Use of prescribed and reference text books, in original (not photocopies) is
permitted. Class notes/slides as reference material in filed or bound form is permitted. However,
loose sheets of paper will not be allowed. Use of calculators is permitted in all exams.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
3. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam. The genuineness of the reason for
absence in the Regular Exam shall be assessed prior to giving permission to appear for the Make-up
Exam. Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.

Potrebbero piacerti anche