Sei sulla pagina 1di 38

Real-­‐Time,

 Interactive  Big  Data  


with  Couchbase  4.0  
Tom  Green  
©2014  Couchbase  Inc.  
©2014  Couchbase  Inc.  
©2014  Couchbase  Inc.  
©2014  Couchbase  Inc.  
Changing  Demands  on  Technology  
Huge  Increase  in  users,  explosion  in  data  quantities  
 
Coupled  with…  
 
Increased  user  expectations:  
§  Fast  and  responsive  applications  
§  Personalized  experience  
§  Constantly  improving  products  with  shorter  release  cycles  
 
 
How  can  systems  provide  individual  personalized  experience,  iterate  to  
release  quickly  and  maintain  responsiveness?  

©2014  Couchbase  Inc.   6  


Use  cases  for  NoSQL  
Profile   360  Degree   Internet  of   Mobile  
Personalization   Things  
Management     Customer  View   Applications  
 

Content   Catalog   Real  Time     Digital   Fraud    


Management   Big  Data   Communication   Detection  
   
 

©2014  Couchbase  Inc.   7  


Changing  trends  in  Hardware  

The  availability  of  cheap,  powerful  commodity  servers  


 
The  decreasing  price  of  memory,  and  increasing  memory  capacities  
 
Introduction  of  SSDs,  and  high-­‐performance  PCI-­‐E  SSDs  
§  More  demand  to  move  away  from  centralized  SAN  storage  
 
 
How  can  these  fundamental  changes  at  the  hardware  level  be  best  utilized  to  
solve  the  increasing  demands  placed  on  technology  by  the  business?    

©2014  Couchbase  Inc.   8  


Changing  trends  in  Hardware  

The  availability  of  cheap,  powerful  commodity  servers  


 
The  decreasing  price  of  memory,  and  increasing  memory  capacities  
 
Introduction  of  SSDs,  and  high-­‐performance  PCI-­‐E  SSDs  
§  More  demand  to  move  away  from  centralized  SAN  storage  
 
 
How  can  these  fundamental  changes  at  the  hardware  level  be  best  utilized  to  
solve  the  increasing  demands  placed  on  technology  by  the  business?    

Requires  fundamental  changes  in  


©2014  Couchbase  Inc.  
database  architecture   9  
Big  Data  =  Operational  +  Analytic  (NoSQL  +  Hadoop)  

Real-time, Batch-oriented
interactive databases analytic databases

OPERATIONAL VELOCITY ANALYTICAL VOLUME

§  Online   §  Offline  


§  Web/Mobile/IoT  apps   §  Analytics  apps  
§  Millions  of  customers/ §  Hundreds  of  business  analysts  
©2014  Couchbase  Inc.   consumers   10  
Today’s  &  tomorrow’s  requirements  
Consistent  performance  at  scale   Easy,  affordable  scalability  

High  availability   Flexible  data  model  

24x365

©2014  Couchbase  Inc.   11  


How  Couchbase  technology  solves  problems  

12  
Couchbase  provides  a  complete  Data  Management  solution  
Multi-­‐purpose  capabilities  support  a  broad  range  of  apps  and  use  cases  
 

High  availability   Key-­‐value   Document   Embedded   Sync    


cache   store   database   database   management  

Enterprises  often  start  with  cache,  then  broaden  usage  to  other  apps  and  use  cases  
 
©2014  Couchbase  Inc.   13  
Major  enterprises  across  industries  are  adopting  Couchbase  
Technology   Retail  &  Apparel   Communications   E-­‐Commerce  &  
  Digital  Advertising  

Finance  &     Travel  &  Hospitality   Games  &  Gaming   Media  &  
Business  Services   Entertainment  

©2014  Couchbase  Inc.   14  


Product  Catalog  @  Tesco  
Objective  &  Challenges  
Provide  centralized,  easy  to  maintain  &  update,  product  catalog  service  
§  Part  of  major  initiative  to  drive  greater  agility  and  data  sharing  across  
§  Largest  UK  retailer   multiple  channels  via  service-­‐oriented  architecture  
§  Adopting  service-­‐
§  Store  product  data  for  10M  items  
oriented  architecture  
for  greater  business   §  Support  frequently  changing  data  and  multiple  data  structures  
agility   §  Provide  fast  access  to  data  
§  Using  Couchbase  as  
Catalog  service  for  1M  
items  (growing  to  10M)  
Solution  
Deploy  Couchbase  Server  as  consolidated  product  data  catalog  
The  Couchbase   §  JSON  document  model  captures  multiple  data  structures:  SKUs,  product  
Advantage   and  accounting  hierarchies,  GTINs  (barcodes,  ISBNs,  etc.)  
Flexible  data  model,  with   §  Data  ingested  via  REST  API  from  multiple  MDM  feeds  (CSV,  XML)  
high  performance  and   §  Easily  and  inexpensively  scales  to  support  10M  products  and  35K  requests  
easy  scalability   per  second  
©2014  Couchbase  Inc.   15  
Couchbase  Architecture:  Single  Node  

16  
Couchbase  Server  Architecture  
8092 11210 8091
Query API Data Access API Admin Console
Single-­‐node  type  means  easier  
administration  and  scaling  
Query Engine

§  Single  installation  

REST Management API


Managed Cache
§  Two  major  components/processes:  

Node/Cluster
Coordination
Web UI
Data  manager  cluster  manager  
§  Data  manager:  
§  C/C++  
Multi-threaded §  Layer  consolidation  of  caching  and  
Persistence Engine persistence  
Erlang / OTP
§  Cluster  manager:  
§  Erlang/OTP  
DATA MANAGER CLUSTER MANAGER
§  Administration  UI’s  
Couchbase Server Node
§  Out-­‐of-­‐band  for  data  requests  
©2014  Couchbase  Inc.  
Couchbase  Read  Operation  
APPLICATION  SERVER   Single-­‐node  type  means  
GET  
easier  administration  and  
DOC  1  
scaling  
§  Reads  out  of  cache  are  extremely  
fast  
MANAGED  CACHE   §  No  other  process/system  to  
communicate  with  
DOC  1  

§  Data  connection  is  a  TCP-­‐binary  


REPLICATION  
QUEUE  
protocol  
DISK  

DISK  
QUEUE  
DOC  1  

©2014  Couchbase  Inc.   18  


Write  Operation  
Single-­‐node  type  means  
APPLICATION  SERVER   easier  administration  and  
DOC  1  
scaling  
§  Writes  are  async  by  default  
§  Application  gets  
acknowledgement  when  
MANAGED  CACHE   successfully  in  RAM  and  can  trade-­‐
off  waiting  for  replication  or  
DOC  1  
persistence  per-­‐write  
REPLICATION  
QUEUE  
§  Replication  to  1,  2  or  3  other  nodes  
DISK  
§  Replication  is  RAM-­‐based  so  
DISK  
extremely  fast  
QUEUE  
§  Off-­‐node  replication  is  primary  
level  of  HA  
§  Disk  written  to  as  fast  as  possible  –  
©2014  Couchbase  Inc.  
no  waiting  
19  
Couchbase  Architecture:  Cluster  

20  
Auto  sharding  –  Bucket  and  vBuckets    
§  A  bucket  is  a  logical,  unique  key  space  
Data buckets
§  Multiple  buckets  can  exist  within  a  single  cluster  of  nodes  

§  Each  bucket  has  active  and  replica  data  sets  (1,  2  or  3  extra  copies)  
§  Each  data  set  has  1024  Virtual  Buckets  (vBuckets)  
§  Each  vBucket  contains  1/1024th  portion  of  the  data  set  
§  vBuckets  do  not  have  a  fixed  physical  server  location  

vB   vB   §  Mapping  between  the  vBuckets  and  physical  servers  is  called  the  
cluster  map  
1 ….. 1024 §  Document  IDs  (keys)  always  get  hashed  to  the  same  vbucket  
§  Couchbase  SDK’s  lookup  the  vbucket  -­‐>  server  mapping  
Virtual buckets

©2014  Couchbase  Inc.   21  


Cluster  Map  

Couchbase SDK

CRC32
Hashing Algorithm

CLUSTER MAP

vBucket1024

vBucket7

vBucket4
vBucket6

vBucket5

vBucket3

vBucket2
vBucket1
...
©2014  Couchbase  Inc.  
Couchbase Cluster
Basic  Operation  
Application  has  single  logical  connection  
to  cluster  (client  object)  
•  Data  is  automatically  sharded  resulting  in  even  
document  data  distribution  across  cluster  
ACTIVE   ACTIVE   ACTIVE   •  Each  vbucket  replicated  1,  2  or  3  times  (“peer-­‐to-­‐peer”  
replication)  
SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD  
5   2   9   4   7   8   1   3   6  

•  Docs  are  automatically  hashed  by  the  client  to  a  shard’  


SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD  
           
•  Cluster  map  provides  location  of  which  server  a  shard  
is  on  
REPLICA   REPLICA   REPLICA  

•  Every  read/write/update/delete  goes  to  same  node  for  


SHARD  
4  
SHARD  
1  
SHARD  
8  
SHARD  
6  
SHARD  
3  
SHARD  
2  
SHARD  
7  
SHARD  
9  
SHARD  
5   a  given  key  
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
  •  Strongly  consistent  data  access  (“read  your  own  
writes”)  
Couchbase  Server  1   Couchbase  Server  2   Couchbase  Server  3  
•  A  single  Couchbase  node  can  achieve  100k’s  ops/sec  so  
no  need  to  scale  reads  
©2014  Couchbase  Inc.   23  
Add  Nodes  to  Cluster  
Application  has  single  
logical  connection  to  
cluster  (client  object)  
READ/WRITE/UPDATE  
§  Multiple  nodes  added  or  
ACTIVE   ACTIVE   ACTIVE   ACTIVE   ACTIVE   removed  at  once  
SHARD  
5  
SHARD  
2  
SHARD  
9  
SHARD  
4  
SHARD  
7  
SHARD  
8  
SHARD  
1  
SHARD  
3  
SHARD  
6   §  One-­‐click  operation  
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
 
§  Incremental  movement  of  
active  and  replica  vbuckets  
and  data  
REPLICA   REPLICA   REPLICA   REPLICA   REPLICA  

SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD  


§  Client  library  updated  via  
4   1   8   6   3   2   7   9   5  
cluster  map  
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
 
SHARD   SHARD  
 
SHARD  
  §  Fully  online  operation,  no  
downtime  or  loss  of  
Couchbase  Server  1   Couchbase  Server  2   Couchbase  Server  3   Couchbase  Server  4   Couchbase  Server  5   performance  

©2014  Couchbase  Inc.   24  


Fail  Over  Node  
Application  has  single  
logical  connection  to  
cluster  (client  object)  
§  When  a  node  goes  down,  
some  requests  will  fail  
ACTIVE   ACTIVE   ACTIVE   ACTIVE   ACTIVE  
§  Failover  is  either  automatic  
SHARD  
5  
SHARD  
2  
SHARD  
1  
SHARD  
4  
SHARD  
7  
SHARD  
3  
SHARD  
1  
SHARD  
3  
SHARD  
9  
SHARD  
8  
SHARD  
 
SHARD  
6  
SHARD  
 
or  manual  

SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   SHARD   §  Client  library  is  
automatically  updated  via  
         

cluster  map  
REPLICA   REPLICA   REPLICA   REPLICA   REPLICA  
§  Replicas  not  recreated  to  
SHARD  
4  
SHARD  
1  
SHARD  
6  
SHARD  
3  
SHARD  
7  
SHARD  
9  
SHARD  
5  
SHARD  
 
SHARD  
8  
SHARD  
  preserve  stability  
SHARD   SHARD  
 
SHARD   SHARD  
 
SHARD   SHARD  
 
SHARD  
2  
SHARD  
 
§  Best  practice  to  replace  
node  and  rebalance  
Couchbase  Server  1   Couchbase  Server  2   Couchbase  Server  3   Couchbase  Server  4   Couchbase  Server  5  

©2014  Couchbase  Inc.   25  


Achieve  Disaster  Recovery  and  data  locality  
Built-­‐in  Cross  Data  Center  Replication  (XDCR)  
 

©2014  Couchbase  Inc.   26  


XDCR:  Cross  Data  Center  Replication  
§  Application  can  access  both  clusters  (master  –  master)  
§  Scales  out  linearly  
§  Different  from  intra-­‐cluster  replication  (“CP”  versus  “AP”)  

©2014  Couchbase  Inc.  


Memory-­‐to-­‐Memory  replication  
NYC Server Cluster

Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 Couchbase Server 4

MEMORY DISK MEMORY DISK MEMORY DISK MEMORY DISK

New  York  
San  
Francisco  

MEMORY DISK MEMORY DISK MEMORY DISK

Couchbase Server 1 Couchbase Server 2 Couchbase Server 3

SF Server Cluster

©2014  Couchbase  Inc.   28  


Rich  Query  and  Indexing  for  Rich  
Data  
N1QL: Superset of SQL

©2014 Couchbase, Inc. 30


Features from SQL — Reads
Reading Data

SELECT Projection

DISTINCT De-duplication

FROM Sourcing

JOIN INNER, LEFT OUTER

WHERE Filtering

GROUP BY Aggregation — HAVING, MIN, MAX, SUM, AVG, COUNT [ DISTINCT ]

ORDER BY Sorting

LIMIT, OFFSET Paging

UNION*, INTERSECT*, EXCEPT* Set operators

EXPLAIN Analyzing and tuning query execution plans

©2014 Couchbase, Inc. *Upcoming 31


Features from SQL — Expressions

Expressions

q  Primitives [ 0, ‘hello’, TRUE ]


Literals
q  NULL

q  Arithmetic [ +, -, *, /, % ]

q  Logical [ AND, OR, NOT ]

Operators q  Comparison [ <, <=, =, !=, >=, >, BETWEEN, IS NULL ]

q  Pattern matching [ LIKE ]

q  Conditional [ CASE ]

q  Numeric [ trigonometric, ROUND, TRUNC, … ]

Scalar functions q  String [ UPPER, LOWER, TRIM, SUBSTR, … ]

q  Date [ string and numeric dates, NOW, date arithmetic*, … ]

Aggregate functions q  MIN, MAX, SUM, AVG, COUNT [ DISTINCT ]

Subqueries q  Subqueries are full expressions

©2014 Couchbase, Inc. 32


Features for Rich Data — Nested Model
Nested Data

Multi-valued attributes Arrays as attributes

Nested objects Multi-level nesting of objects; path navigation

NEST Collecting second term into nested array [ INNER, LEFT OUTER ]

UNNEST Flattening nested array [ INNER, LEFT OUTER ]

Collection operators Mapping, filtering, predicate, indexing, and slicing operators

Collection functions Sort, Reverse, Distinct, Append, Concatenate, Contains…

Collection aggregation MIN, MAX, SUM, AVG, COUNT [ DISTINCT ]

Deep traversal* Finding or collecting of matching elements WITHIN any depth

Array and deep update* UPDATE of matching elements IN arrays and WITHIN any depth

Construction Dynamic construction of objects, arrays, and their combinations

©2014 Couchbase, Inc. *Upcoming 33


Features for Distributed Data

Distributed Data

§  KEYS available in SELECT, UPDATE, DELETE


Key-value access
§  KEYS used in JOIN, NEST, subqueries*, MERGE*

Document metadata §  META function to access ID, TTL, CAS, FLAGS…

LIMIT on UPDATE & DELETE* §  LIMIT available in write statements

*Upcoming

©2014 Couchbase, Inc. 34


Architecture
Topology

Homogeneous node image


Couchbase Classic App
App Node roles & deployment flexibility

Client SDK ODBC/JDBC Independent resourcing & scaling

Query throughput & availability

CB Node CB Node CB Node CB Node CB Node CB Node

Manager Manager Manager Manager Manager Manager

Data Data Data Data Data Data

Index Index Index Index Index Index

Query Query Query Query Query Query

©2014 Couchbase, Inc. 36


Query Execution
Request Response
Client

Pre-Aggregate
Parse Plan Scan Fetch Join Filter Aggregate Sort Offset Limit Project

Data-parallel — Query is N data streams over N cores*


Index Data Memory-based
Pluggable architecture — datastore, index…
*Upcoming
©2014 Couchbase, Inc. 37
Couchbase  Contact  Details  
§  Please  email  :  Azam@Couchbase.com  if  you  have  any  
questions  related  to  Couchbase.    
 

©2014  Couchbase  Inc.   38  

Potrebbero piacerti anche