Cubes Poster - PyCon 2014

Cubes
light weight pluggable data warehouse
Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online
analytical processing (OLAP) of categorical data. Light weight.
Authors: Robin Thomas and Stefan Urbanek
Overview
Pluggable Analytical Workspace
Take your Google Analytics, and your SQL database, and you have a
single way for your users to access all of them. No need to grant
account access for everyone to each particular datasource.
Manages the model, data stores and model providers.
sales
%
Workspace
"
Store
|
Browser
churn
BI Data
(Postgres)
activations
events
BI Data 2
Events
(Mongo)
(API)
[store_data]
type: sql
url: postgres://localhost/data
[store_data2]
type: mongo
host: localhost
[store_events]
type: mixpanel
api_key: 123456
api_secret: 123456
workspace = Workspace()
workspace.import_model(model.json)
workspace.register_default_store(sql, postgres://localhost/data)
Supported Backends:
analytical data
source data
API Model
Provider
Stores
easy to use JSON over HTTP API

can be integrated as Flask Blueprint
&
Authorizer
Static Model
Provider
Cubes
Workspace browsing of aggregated data

multi-dimensional data modeling
unied interface for analytical data
Server
Stores conguration:
Model Providers
#
Server
|
User Interface
Slicer
$
Authenticator
collect other (external) Cubes Slicers
Model
Slicing and Dicing
Metadata Logical description of data: cubes, dimensions, measures

and aggregates.
Cubes logical data structure, collection of measurable facts (invoices,
phone calls, events, )
Dimensions provide context for facts, used to lter queries or reports
and control scope of aggregation of facts. Might contain concept
hierarchies such as categorysubcategoryproduct or date hierarchy
(year-month-day or yearquartermonth-day) or geographical
hierarchies.
Model also provides information about mapping to the physical data
store.
Cell Provides context of interest, composed of cuts. There are three

kinds of cuts: point, set and range. Cuts can be also inverted using
invert=True, which will yield cells outside of the cut.
point cut single dimension member
cut1 = PointCut(9 status, [open])
cut2 = SetCut(9 region, [[sk, ba], [hu]])

SetCut
dimension
id
sales
year
sale_date
month
product_id
code
product
store_id
address
id
amount
code
name
Denormalized
paths
range cut members between two values of an

ordered dimension (such as date)
Physical model
store
path
set cut multiple dimensions members
SQL schema example:

date
dimension
PointCut
"cubes": [
{
"name": "sales,
measures: [amount],
"dimensions": [date, product", "store"]
"joins": [
{"master:date_id, "detail:date.id"},
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
RangeCut
cut3 = RangeCut(9 date, [2010, 1], [2012])

to
dimension
from
ocell = Cell([cut1, cut2, cut3])
Cell as user interface element: multi-dimensional breadcrumbs

lter UI
Browsing and Aggregation

Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specic aggregation or
interface to another aggregation engine
browser = workspace.browser(contracts)
result = browser.aggregate(o cell,
. drilldown=[9 sector])
Drill-down
cell[level.label_attribute]
cell[level.key]
for table cell or link label
for URL
Summary
Logical
Physical
aggregate
AggregationResult
|
"
Browser
Store
model
result.summary
backend-specic
might hold a database cursor
facts
SQL or other backend-specic

query is generated
result.cells
iterable
pri
ce
for cell in result:

print %s: %s % (cell[level.label_attribute],
cell[aggregate.name])
measure
physical data store

(database or API)
aggregate:
price_sum =
price
facts
cell[price_sum]
aggregate
Drill-down Get more details by year, by produce, by store,

result.cells
result.summary
o cell = Cell(cube)
browser.aggregate(o cell)
browser.aggregate(o cell,
drilldown=[9 date])
result.cells
cut = PointCut(9 date, [2010])
o cell = o cell.slice( cut)
browser.aggregate(o cell,
drilldown=[9 date])
Slicer Server
Visualizers
Unied aggregation interface to variety of data stores and services.

JSON interface over HTTP. Built using Flask web micro-framework. Can
be used as a stand-alone server or integrated in another application
and serve as an analytical module.
Turning JSON data into reports, charts, tables. It is very easy to build
custom visualization on top of the Cubes analytical data with
framework of yor choice.
List cubes:
Slicer
GET /cubes
Cell
(point of view)
Get cube model (metadata):

aggregates
GET /cube/sales/model
facts
(details)
Aggregate:
Cubes
GET /cube/sales/aggregate? cut=date:2010
model
& drilldown=date,region & split=status:1

& page=10 & page_size=100
}
"cell": [],
"total_cell_count": 2,
"drilldown": [
{
"record_count": 31,
"amount_sum": 550840,
date.year": 2009
},
{
"record_count": 31,
"amount_sum": 566020,
date.year": 2010
}
],
"summary": {
"record_count": 62,
"amount_sum": 1116860
}
Slicer JSON response
/facts
get list of facts within a cell (if available)
/members list dimension members within a cell
/cell
get multi-dimensional breadcrumbs information,
browsing context or where am I looking at?
generic visualizers and reporting applications
Results can be paginated page=, ordered with order= and also

formatted as CSV or newline separated JSON records using format=.
specic purpose reports
Use either a generic visualizer and reporting application such as Cubes

Visualizer or Cubes Viewer or create one that suits your reporting
needs.
Ways of Deployment
Authentication and Authorization
Quick ways of creating an analytical data server or adding an analytical

module into your application or on top of your system.
Authorization Manage access to the cubes or part of a cube using

access rights. User might have a right only to al imited set of cubes or
might have access to a particular cell in the cube. For example
engineers might not have access to the nancial cube and stores might
have access to nancials only for their store.
Python web framework using Cubes python module Plug-in for

Flask application Stand-alone server with HTML+JS front-end or stadalone server with external application.
HTML
HTML
HTTP request
JSON reply
Flask
Django, Flask,
Cubes
Python API
Web Application
HTML+JS, RoR,
Slicer Blueprint
JSON reply
model
model
bash$ slicer serve slicer.ini
Slicer server
store
Serving with the slicer tool:
Built-in authorizer uses a

JSON rights conguration le:
model
store
Simple deployment with UWSGI:

[uwsgi]
socket = 127.0.0.1:5000
module = cubes.server.app
callable = application
Flask blueprint integration:

from cubes.server import slicer
app = Flask(__name__)
app.register_blueprint(slicer,
url_prefix="/slicer")
Custom authorizer:
class CustomAuthorizer(Authorizer):
def authorize(self, cubes):
authorize with a database
return authorized_cubes
lidia: {
allowed_cubes: [sales],
cube_restrictions: {
sales: [store:3]
}
},
martin: {
allowed_cubes: [sales],
cube_restrictions: {
sales: [store:5]
}
}
def restricted_cell(self, identity, cube, cell):

# Restriction with user identity dimension
cut = PointCut(users, [identity])
restriction = Cell(cube, [cut])
if cell:
return cell & restriction
else:
return restriction
Authentication Server-side, plug-in based action that based on users

credentials or any other relevant information, provides a user identity
which is passed to the workspace.
There are two built-in atuhenticators: pass_parameter: pass identity
as a URL parameter, permissive method and http_basic_proxy:
permissive authentication using HTTP Basic method.
Backends
SQL Backend
Bring your own aggregation engine. Take your Google Analytics, and
your SQL database, and you have a single way for your users to access
all of them. No need to grant account access for everyone to each
particular datasource.
Built-in backend for ROLAP (Relational OLAP)

Features:
star and snowake schema support
joins are executed only if needed for a given query
mapping of DATE data type without the date dimension table
simple support of non-additive/semi-additive dimensions and
aggregates
split cell dimension mark cells as within or outside of a split cell
support for outer-joins
Backend modules:
#
Model Provider
model
|
Browser
!
Store
OR
Model Provider A live cubes concept mapper: maps foreign

cubes or foreign cube-like structures into Cubes model.
Aggregation Browser provides the core functionality of
aggregation or delegates the aggregation to an external
aggregator.
Store manages access to the data, establishes and
maintains database connections, generates appropriate
external API calls (pretends to be a store).
Backend
Cubes / Facts
Dimensions
Model
table
column (table)
required
MongoDB
collection
key/attribute
required
Mixpanel
event
property
automatic
Google Analytics
metric
dimension
automatic
cube
dimension
automatic
SQL
Cubes concepts:
Slicer
OR
subject category
subject
supplier
supplier type
subject dimension supplier dimension
contract
date
city
region
denormalization
data brewery.org
$ http://cubes.databrewery.org
% https://github.com/stiivi/cubes
& #databrewery at irc.freenode.net
Published for PyCon, April 2014, based on Cubes v1.0
geography dim.
date dim.

Cubes Poster - PyCon 2014

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cubes Poster - PyCon 2014

Caricato da

Copyright:

Formati disponibili

Cubes

light weight pluggable data warehouse

Pluggable Analytical Workspace

Manages the model, data stores and model providers.

easy to use JSON over HTTP API

Workspace browsing of aggregated data

collect other (external) Cubes Slicers

Slicing and Dicing

Metadata Logical description of data: cubes, dimensions, measures

Cell Provides context of interest, composed of cuts. There are three

cut2 = SetCut(9 region, [[sk, ba], [hu]])

range cut members between two values of an

set cut multiple dimensions members

SQL schema example:

cut3 = RangeCut(9 date, [2010, 1], [2012])

ocell = Cell([cut1, cut2, cut3])

Cell as user interface element: multi-dimensional breadcrumbs

Browsing and Aggregation

for table cell or link label

SQL or other backend-specic

for cell in result:

physical data store

Drill-down Get more details by year, by produce, by store,

Unied aggregation interface to variety of data stores and services.

Get cube model (metadata):

GET /cube/sales/aggregate? cut=date:2010

& drilldown=date,region & split=status:1

generic visualizers and reporting applications

Results can be paginated page=, ordered with order= and also

specic purpose reports

Use either a generic visualizer and reporting application such as Cubes

Authentication and Authorization

Quick ways of creating an analytical data server or adding an analytical

Authorization Manage access to the cubes or part of a cube using

Python web framework using Cubes python module Plug-in for

bash$ slicer serve slicer.ini

Serving with the slicer tool:

Built-in authorizer uses a

Simple deployment with UWSGI:

Flask blueprint integration:

def restricted_cell(self, identity, cube, cell):

Authentication Server-side, plug-in based action that based on users

Built-in backend for ROLAP (Relational OLAP)

Model Provider A live cubes concept mapper: maps foreign

subject dimension supplier dimension

Potrebbero piacerti anche