Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Presented by,
MySQL & O’Reilly Media, Inc.
What are we talking about today?
Financial Data, more specifically stock market
data as an example
The basic design of a MySQL database that
contains a daily history of stock prices
Building a stock machine and some of the
challenges posed
Some large data ‘gotchas’ and solves
Some large mysql ‘gotchas’ and solves
Financial data
$lines = file($price_file);
$counter = 0;
foreach ($lines as $line_num => $line ) {
$counter = $counter+1;
$row = explode(",",$line);
$cusip = $row[0];
$ric = $row[1];
$asof_date = $row[2];
$open = $row[3];
$high = $row[4];
$low = $row[5];
$close = $row[6];
$volume = str_replace( "\n", "", $row[7] );
$split_adjustment = str_replace( "\n", "", $row[8] );
$today = date('Y-m-d');
if($split_adjustment=='') {
$split_adjustment = '0.00000';
}
load_raw_prices() (cont’d)
$query = "INSERT INTO RAW_PRICE ( CUSIP, RIC, ASOF_DATE, OPEN, HIGH, LOW, CLOSE, VOLUME,
SPLIT_FACTOR, LOAD_DATE ) VALUES ( "
. "'" . $cusip . "',"
. "'" .$ric . "',"
. "'" .$asof_date . "',"
. $open . ","
. $high . ","
. $low . ","
. $close . ","
. $volume . ","
. $split_adjustment . ","
. "'" . $today . "')" ;
if (($counter%100)==0) {
echo $counter . " lines processed.\n";
}
}
echo $counter . " total lines processed.\n";
}
daily_price_clean()
function daily_price_clean( $source_file, $new_file ) {
$lines = file($source_file);
foreach ($lines as $line_num => $line ) {
# strip "-9,999,401"
$line = str_replace("\"-9,999,401\"","NULL",$line);
# strip volume quotes and commas
$pieces = explode("\"",$line);
$pieces[1] = str_replace(",","",$pieces[1]);
$fixed_line = implode("",$pieces);
# do some more funky stuff to get the date re-arranged
$date_repair = explode(",",$fixed_line);
$date_digits = explode("/",$date_repair[2]);
$date_repair[2] = "20" . $date_digits[2] . "-" .
$date_digits[0] . "-" . $date_digits[1];
$fixed_line2 = implode(",",$date_repair);
# write out new file
if ( !file_exists($new_file)) {
touch ($new_file);
}
$handle = fopen ($new_file, 'a');
fwrite($handle, $fixed_line2);
fclose($handle);
}
}
load_secuirty()
function load_security( $security_file ) {
$lines = file($security_file);
$counter = 0;
$counter = $counter+1;
$row = explode(",",$line);
$cusip = $row[0];
$ric = $row[1];
$ticker = $row[2];
$today = date('Y-m-d');
$query = "INSERT INTO SECURITY ( CUSIP, RIC, TICKER, CREATED_DATE ) VALUES ( "
. "'" . $cusip . "',"
. "'" . $ric . "',"
. "'" . $ticker . "',"
. "'" . $today . "')" ;
load_secuirty() (cont’d)
# put the rows in the raw_prices table
sm_query( $query );
if (($counter%100)==0) {
echo $counter . " lines processed.\n";
}
}
echo $counter . " total lines processed.\n";
}
load_prices()
function load_prices( $date ) {
$query = "INSERT INTO PRICE
SELECT
S.SECURITY_ID, RP.ASOF_DATE, RP.OPEN, RP.HIGH,
RP.LOW, RP.CLOSE, RP.VOLUME, RP.SPLIT_FACTOR,
date(now())
FROM
RAW_PRICE RP,
SECURITY S
WHERE
S.RIC = RP.RIC
AND
RP.ASOF_DATE = '" . $date . "'";
echo $query ;
sm_query( $query );
}
Dependency Task Scheduling
Php and shell scripts are useful tools to download and process price data
But cron doesn’t do a very good job of keeping track in a database of when something
starts, finishes, fails, fails to start
If email is broken or cron isn’t reporting correctly you may not know of problems until it’s
too late
Often a layer of metadata fails b/c of failed or weird market data, a missing price can make
a graph or signal look weird to customers
You can’t load prices if the ftp or feed fails
You can’t process corporate actions until you know the price
You can’t get accurate calculations against time-series if there’s holes in the series
You can’t send signals or present accurate graphs if anything related to a security fails
Keeping track of failed jobs gives you a flag that can also tell your users what they’re
seeing is questionable and will be corrected
You can report on a jobs list and throw alerts on failed jobs
Tracking variances in data quality
Price weirdness:
yesterday’s price / today’s price
Row weirdness:
num rows yesterday / num rows today
Range weirdness:
yesterday’s average of a sum / today’s average of a
Questions?
Acknowledgements
Starmine: Tripp, Flanzer, Foster, Breffle, Miller
Cake Financial: Reed