Sei sulla pagina 1di 40

HBASE

V.Hariharaputhran
o Fourteen years in Oracle Development / DBA / Big Data / Cloud Technologies
o All India Oracle Users Group (AIOUG) Evangelist
o Passion to learn and share
o Blog: www.puthranv.com
Harish P
o Eight Plus years in Oracle DBA
o Big Data / Cloud Technologies/ RAC
Specialist
o All India Oracle Users Group (AIOUG)
Evangelist
o Passion to learn and share
Agenda
Big Data Introduction
Hadoop Components
Hbase Overview
Hbase in Hadoop
Why Hbase
Hbase Architecture
Hbase Read and Write
Data Data DataLots of Data

Twitter
Facebook
Google keeps track of you
World Population
Banking/Telecom/Energyevery industry contribute
No Data Archiving Logic
Iam always online

5
Internet of People to Internet of Things

Devices TALK to each other as they become SMART & generate DATA
QUALITY & MONITOR POLLUTION
SMART SHOPPING MAINTAIN & REPAIR
CONSISTENCY LEVELS

WILDLIFE PROTECTION FARMING ENERGY

6
Hadoop Components

7
Hadoop Components

HDFS Distributed File system


MapReduce Distributed Data Processing
Model
Hive Provides SQL-Based Query Language
HBASE Distributed column-based database
Pig Data Flow Execution

8
HDFS - Daemon / Background Process
NN SNN
Name Node (NN)
DN4

Secondary
Name Node(SNN)

Data Node(DN)

9
DN1 DN2 DN3
MapReduce - Daemon / Background Process

Job Tracker NN SNN

Task Tracker

10
DN1 DN2 DN3
Hbase Daemon / Background Process

Hbase Master HM SNN

Region Server

11
RS1 RS2 RS3
SQL vs NoSQL

EMPID NAME SALARY CITY


100 Karthick 50000 CHENNAI
101 Shiva 40000 Row Column

100 CF Name Timestamp value = Karthick


EMPID NAME SALARY CITY
100 CF Salary Timestamp value = 50000
100 Karthick 50000 DELHI 101
100 CF City
Name Timestamp
Timestamp value = Chennai
Shiva
101 Shiva 40000 100
101 CF Salary
City Timestamp value = 40000
Delhi

12
No SQL Databases

NO SQL

Document Key-value Wide-column


databases stores stores

13
Hbase Keys & Column Families

Each record is divided into Column Families

Each row has a Key

Rowkey Personal Data Demographic


100 Name Address DOB Gender
101 Tom SFO 01-01-1960 M
Mike SFO 01-01-1970 M

Each column family consists of one or more Columns

14
Hbase Overview

Scalable, distributed data store


Open source avatar of Googles Bigtable
Sparse
Tightly integrated with Hadoop
Not a RDBMS

15
Hbase is

Column family oriented database


Column family oriented
Tables consisting of rows and columns
Persisted Map
Sparse
Multi dimensional
Sorted
Indexed by rowkey, column and timestamp
Key Value store
[rowkey, col family, col qualifier, timestamp] -> cell value

16
Hbase is not..

A relational database
No SQL query language
No joins
No secondary indexing
No transactions

17
When to use Hbase

Data volume

Application Types

Hardware environment

No requirement of relational features

Quick access to data

18
Hbase Features

Scalability
Sharding
Distributed storage
Failover support
API support
MapReduce support
Back up support

19
Hbase Vs RDBMs

20
Hbase Shell

bin/hbase shell
Create table
create mytable , cf1
List tables
list
Describe table
describe mytable

21
Hbase Shell Cont

Put a row
put mytable , row1, cf1:cq1 , val1
Get a row
get mytable , row1
Put more
put mytable , row2 , cf1:cq1 , val2
put mytable , row1 , cf1:cq2 , val3
Get a row
get mytable , row1
Scan table
scan mytable
22
Demo

23
Hbase Column Families Cont

Key Value Pair


Rowkey ColumnFamily Column Timestamp Value
CF1 COL1 123 INDIA
COL1 124 27
COL2 126 AIOUG
Row Format
1
COL2 127 NI CF1 CF2
Timestamp Row Key
CF2 COL3 123 12.6 COL1 COL2 COL3
COL3 128 ORACLE 123 1 INDIA 12.6
124 1 27
126 1 AIOUG
127 1 NI
128 1 ORACLE

24
Hbase Read and Write

25
Hbase Catalog Tables

Keeps Track where


.META FILE is
present
Keeps Track of All Table,
Regions that are present
26
Meta Table

27
Hbase Region and Region Servers
Table - TBL
Region Server - RS1210
a
b Table TBL,Region 1
Region1 c
d Table TBL,Region 2
e
f Region Server - RS 1230
Region2 g
h Table TBL,Region 3
i Table T, Region 240
j
Region3 k
Region Server - RS1260
l
m Table TBL,Region 4
n
Region4 o Table A,Region 500
p

28
Hbase Region
A table can be divided horizontally into one or more regions. A region
contains a contiguous, sorted range of rows between a start key and an end
key
Each region is 1GB in size
A region of a table is served to the client by a RegionServer

29
Hbase Client Locate Data

30
META Hbase Client Read / Locate Data META
Cache DATA

Client Zookeper
META Location

Region Region
Server Server
DATA NODE DATA NODE
31
Where does your data Reside ?

32
Hbase Region Server Components

33
Hbase Write

34
Hbase Write
Client HMaster

Region Server 102


100
1
50 1
ACK 50
Memstore
100

WAL

HFile
35
Region Server 102
How Data is Stored in Hfile

36
Demo

37
Hbase Delete

When Delete command is triggered actual data is not deleted


A tombstone marker is set
HBase periodically removes deleted cells during compactions.
Tombstone Marker
- > Version delete marker
Marks a single version of a column for deletion
-> Column delete marker
Marks all versions of a column for deletion
-> Family delete marker
Marks all versions of all columns for a column family for deletion

38
39
40

Potrebbero piacerti anche