Sei sulla pagina 1di 40

How Capital Markets Firms Use

MongoDB as a Tick Database


Antoine Girbal, Technical Account Manager
Email: antoine@10gen.com
Twitter: @antoinegirbal
2
MongoDB Introduction
FS Use Cases
Writing/Capturing Market Data
Reading/Analyzing Market Data
Performance, Scalability, & High Availability
Q&A
Agenda
3
Introduction
10gen is the company behind MongoDB
the leading next generation database
Document-
Oriented

Open-
Source
General
Purpose
4
10gen Overview
200+ employees 500+ customers
Over $81 million in funding
Offices in New York, Palo Alto, Washington
DC, London, Dublin, Barcelona and Sydney
5
Database Landscape
No Automatic Joins
Document Transactions
Fast, Scalable Read/Writes
6
MongoDB Business Benefits
Increased Developer Productivity Better Customer Experience
Faster Time to Market Lower TCO
7
MongoDB Technical Benefits
Horizontally Scalable
-Sharding
Agile &
Flexible
High
Performance
-Indexes
-RAM
Application
Highly
Available
-Replica Sets
{ author: roger,
date: new Date(),
text: Spirited Away,
tags: [Tezuka, Manga]}

8
Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
9
Tick Data Capture & Analysis -
Requirements
Capture real-time market data (multi-asset, top of
book, depth of book, even news)
Load historical data
Aggregate data into bars, daily, monthly intervals
Enable queries & analysis on raw ticks or
aggregates
Drive backtesting or automated signals

10
Tick Data Capture & Analysis
Why MongoDB?
High throughput => can capture real-time feeds for all
products/asset classes needed
High scalability => all data and depth for all historical time periods
can be captured
Flexible & Range-based indexing => fast querying on time ranges
and any fields
Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
Map-reduce capability (Native MR or Hadoop Connector) => batch
analysis looking for patterns and opportunities
Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
Low TCO => Low software license cost and commodity hardware


Writing/Capturing Tick Data
12
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders
13
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders
Data Types
Top of book
Depth of book
Multi-asset
Derivatives (e.g. strips)
News (text, video)
Social Networking






14
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}

> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )

Top of book [e.g. equities]
15
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}

> db.ticks.find( {bidPrices: {$gt: 55.36} } )

Depth of book
16
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bids: [
{price: 55.37, amount: 500},
{price: 55.37, amount: 1000},
{price: 55.37, amount: 2000} ],
offers: [
{price: 55.58, amount: 1000},
{price: 55.58, amount: 2000},
{price: 55.59, amount: 3000} ]
}
> db.ticks.find( {"bids.price": {$gt: 55.36} } )



or any way your app uses it
17
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
spreadPrice: 0.58
leg1: {symbol: CLM13, price: 97.34}
leg2: {symbol: CLK13, price: 96.92}
}
db.ticks.find( { leg1 : CLM13 },
{ leg2 : CLK13 },
{ spreadPrice : {$gt: 0.50 } } )

Synthetic spreads
18
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: Disney Earnings
body: Walt Disney Company reported,
tags: [earnings, media, walt disney]
}


News
19
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: jdoe,
tweet: Heard @DisneyPictures is releasing,
usernamesIncluded: [DisneyPictures],
hashTags: [movierumors, disney]
}
Social networking
20
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS,
openTS: Date("2013-02-15 10:00"),
closeTS: Date("2013-02-15 10:05"),
open: 55.36,
high: 55.80,
low: 55.20,
close: 55.70
}
Aggregates (bars, daily, etc.)
Querying/Analyzing Tick Data
22
Architecture for Querying Data
Higher Latency
Trading
Applications
Backtesting
Applications
Ticks
Bars
Other analysis
Research &
Analysis
Applications
23
Index any fields: arrays, nested, etc
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})

// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})

// Index on any depth
> db.ticks.ensureIndex( {bids.price: 1} )

// Full text search
> db.ticks.ensureIndex ( {tweet: text} )
24
Query for ticks by time; price
threshold
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", VIA, CBS"]},
timestamp: {$gt: new ISODate("2013-01-01")},
timestamp: {$lte: new ISODate("2013-01-31")}})

// Ticks when Disneys bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-01")}})
25
Custom application code
Run your queries, compute your results
Aggregation framework
Declarative, pipeline-based approach
Native Map/Reduce in MongoDB
Javascript functions distributed across cluster
Hadoop Connector
Offline batch processing/computation


Analyzing/Aggregating Options
26
//Aggregate minute bars for Disney for this month
db.ticks.aggregate(
{ $match: {symbol: "DIS, timestamp: {$gt: new ISODate("2013-02-01")}}},
{ $project: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$hour: "$timestamp"},
minute: {$minute: "$timestamp"},
second: {$second: "$timestamp"},
timestamp: 1,
price: 1}},
{ $sort: { timestamp: 1}},
{ $group :
{ _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"},
open: {$first: "$price"},
high: {$max: "$price"},
low: {$min: "$price"},
close: {$last: "$price"} }} )
Aggregate into min bars
27

//then count the number of down bars
{ $project: {
downBar: {$lt: [$close, $open] },
timestamp: 1,
open: 1, high: 1, low: 1, close: 1}},
{ $group: {
_id: $downBar,
sum: {$sum: 1}}} })
Add analysis on the bars
28
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: tickSums"})

Map-Reduce Example: Sum
29
MongoDBs Hadoop Connector
Supports Map/Reduce, Streaming, Pig
MongoDB as input/output storage for Hadoop
jobs
No need to go through HDFS
Leverage power of Hadoop ecosystem against
operational data in MongoDB
Process Data on Hadoop
Performance, Scalability, and
High Availability
31
Why MongoDB is fast and scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scaling
32
Auto-Sharding for Horizontal Scale
mongod
Read/Write Scalability
Key Range
Symbol: AZ
33
Auto-Sharding for Horizontal Scale
Read/Write Scalability
mongod mongod
Key Range
Symbol: AJ
Key Range
Symbol: KZ
34
Sharding
mongod mongod
mongod mongod
Read/Write Scalability
Key Range
Symbol: AF
Key Range
Symbol: GJ
Key Range
Symbol: KO
Key Range
Symbol: PZ
35
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
MongoS MongoS MongoS
Application
Key Range
Symbol: AF,
Time
Key Range
Symbol: GJ,
Time
Key Range
Symbol: KO,
Time
Key Range
Symbol: PZ,
Time
36
Subscriptions
Professional Support, Enterprise Edition and Commercial License
10gen Products and Services
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person, for Developers and Administrators
37
MongoDB is high performance for tick data
Scales horizontally automatically by auto-
sharding
Fast, flexible querying, analysis, & aggregation
Dynamic schema can handle any data types
MongoDB has all these features with low TCO
10gen can support you with anything discussed
Summary
38
Resource Location
MongoDB Downloads www.mongodb.org/download
Free Online Training education.10gen.com
Webinars and Events www.10gen.com/events
White Papers www.10gen.com/white-papers
Customer Case Studies www.10gen.com/customers
Presentations www.10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
For More Information
Resource User Data Management
How Capital Markets Firms Use
MongoDB as a Tick Database
Matt Kalan, Sr. Solution Architect
Email: Matt.kalan@10gen.com
Twitter: @matthewkalan

Potrebbero piacerti anche