Topics For The Day - 20170722

Lesson - 1 (Intro to Big Data and Hadoop)
Revision (—)
⁃
Topics for Today (22th July 2017) :-
⁃ Introductory Presentation
⁃ Your introduction (In Process)
⁃ Name
⁃ Location
⁃ Years of Exp
⁃ A note on my blog
⁃ A note on technical and related queries
⁃ Google Drive link
⁃ TOC File daily updates and uploads
⁃ Java Modules
⁃ Books reference
⁃ Interview Book
⁃ Big Data Case Studies and setting the context
⁃ Quick Case Studies
⁃ Customer Churn Analysis - Slide 11
⁃ Point of sale transaction - Slide 12
⁃ What is big data ? - Slide 15-16
⁃ 3Vs - Slide 14
⁃ 5Vs - Slide 20
⁃ Evolution of Big Data - SR Slide 7
⁃ Types of Data - SR Slide 7
⁃ Big Data Challenges
⁃ Introduction to Hadoop
⁃ Software Know how - done
⁃ Local Set up
⁃ Mac
⁃ Unix
⁃ Hadoop Philosophies
Homework for Today (22nd July 2017) :-

1. Downloan and configure Acadgild VM on your machines
2. Java modules - My modules
3. Read 2 chapters of definitive guide
4. Read the Case Studies / Reading material
5. Hadoop Single Node Installation
6. Start reading the big data interview guide
7. Commonly used unix / linux commands from my blog
http://syed-rizvi.blogspot.in/
Big Data Challenges
a) Storage
b) Processing
c) Mannual Distributed Computing
Apache Hadoop is an open source framework which provides an
automated distributed computing environment that supports
storage of big data sets. It does that storage using a cluster of
commodity machines. It then analyses this stored big data using a
very simple programming model.
The storage mechanism is known as HDFS (Hadoop Distributed File

System). It is based on google GFS(Google File System) white paper.
The analytical mechanism is known as Map Reduce and is based on

google map reduce white paper.
There are 4 basic philosophies on which hadoop works.
a) All the basic software that helps start a hadoop cluster is a

software daemon.
b) All the above daemons are based on master and slave
architecture.
c) The entire hadoop framework is divided into 2 broad parts -
storage (HDFS) and processing (Map Reduce).
Hadoop 2.x
⁃ HDFS (Storage Part)

⁃ Master Daemon - Namenode (High End Admin Machine) (1 in
number)
⁃ Back up Master Daemon - Secondary Namenode (High End
Admin Machine) (1 in number)
⁃ Slave Daemons - Datanode (Commodity Machines) (Many in
number)
⁃ Map Reduce (Processing Part) - YARN (Yet Another Resource
Negotiater)
⁃ Master Daemon - ResourceManager (High End Admin
Machine) (1 in number)
⁃ Slave Daemon - NodeManager (Commodity Machines) (Many
in number)
d) Hadoop is a batch oriented system which can never be plugged

behind and online transaction processing system. Moreover, it a
write once, ready many times data storage mechanism. This means,
you can never update the data. If you really want to update, you
need to delete the previous version and upload a new copy.
Data Node is a slave daemon software for the storage part of
hadoop.
Likewise, Node manager is a slave daemon software for the
processing part of hadoop.
When you refer to hardware in hadoop, you always refer to it as
either a commodity machines or a slave node.
Important point to remember, there is a difference between “Slave
daemon” and “slave node” (which btw is also called as commodity
hardware.)
Both the datanode slave daemon and node manager slave daemon
run on a commodity machine or a slave machine (which is a
hardware)
A1
A2
F1 - A1 - 100 xxxxx
F2 - A1 - 100
10 TB - I Machine (Commodity Machine) -> 10 hrs

10 TB - I0 Machine -> 1 hrs
blog
http://syed-rizvi.blogspot.in/
https://drive.google.com/folderview?
id=0BwfmpHQetSFES3UzTDhITkR2Q3c&usp=sharing#list
Case Studies
http://www.informationweek.com/it-leadership/why-sears-is-going-all-in-
on-hadoop/d/d-id/1107038?
http://www.computerweekly.com/news/2240219736/Case-Study-How-big-
data-powers-the-eBay-customer-journey
- Stand alone Mode
- Pseudo Distributed Mode
- Multi - distributed mode
Break for 5 Mins - Lets meet at 9:57 AM by my computer

- Datascience
- Basics of statistics
- R, Tableu

Topics For The Day - 20170722

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Topics For The Day - 20170722

Caricato da

Copyright:

Formati disponibili

Lesson - 1 (Intro to Big Data and Hadoop)

Homework for Today (22nd July 2017) :-

Big Data Challenges

The storage mechanism is known as HDFS (Hadoop Distributed File

The analytical mechanism is known as Map Reduce and is based on

There are 4 basic philosophies on which hadoop works.

a) All the basic software that helps start a hadoop cluster is a

⁃ HDFS (Storage Part)

d) Hadoop is a batch oriented system which can never be plugged

10 TB - I Machine (Commodity Machine) -> 10 hrs

- Stand alone Mode

- Pseudo Distributed Mode

- Multi - distributed mode

Break for 5 Mins - Lets meet at 9:57 AM by my computer

Potrebbero piacerti anche