Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Articles
Startingkdbplus/introduction Startingkdbplus/qlanguage Startingkdbplus/ipc Startingkdbplus/tables Startingkdbplus/hdb Startingkdbplus/rdb 1 3 12 14 18 21
References
Article Sources and Contributors Image Sources, Licenses and Contributors 24 25
Article Licenses
License 26
Startingkdbplus/introduction
Startingkdbplus/introduction
1.1 Overview
This is a quick start guide to kdb+ from Kx Systems, aimed primarily at those learning independently. It covers system installation, the q environment, q ipc, kdb+ tables and typical databases, and where to find more material. After completing this you should be able to follow the Borror tutorials Q for Mortals and Kdb+ For Mortals, and the wiki Reference and Cookbooks pages. One caution: you can learn kdb+ reasonably well by independent study, but for serious evaluation of the product you need the help of a kdb+ consultant. This is because kdb+ is typically used for very demanding applications that require experience to set up properly. Contact Kx Systems or one of its partners for help with such evaluations.
1.2 Kdb+
The kdb+ system is both a database and a programming language: kdb+ the database (k database plus). q the programming language for working with kdb+ Both kdb+ and q are written in the k programming language. You do not need to know k to work with kdb+, but will occasionally see references to it. For example, q is defined in the distributed script q.k.
1.3 Resources
Kx wiki
The Kx wiki is the best resource for learning kdb+, and includes: Jeff Borror's tutorials Q for Mortals and Kdb+ For Mortals. a cookbook of common tasks a reference on the built-in functions an svn repository with user and Kx contributed code.
Kx Html Pages
Some older, but still useful, html pages are at kx.com/documentation.php Kdb+ Database and Language Primer [2].
[1]
Discussion groups
the main discussion forum is the k4 listbox [4]. This is available only to licensed customers - please use a work email address to apply for access. the Kdb+ Personal Developers [5] forum is an open Google discussion group for users of the trial system.
Startingkdbplus/introduction
Additional Files
The kx.com/q [6] directory has various supporting files, for example the script sp.q referenced in this guide (which is also included with the trial system). These files are also copied to the svn repository, so for example, the sp.q script can also be found at kdb+.
If you need to install q elsewhere, set the environment variable QHOME to point to the new directory. If QHOME is not defined, kdb+ defaults to $HOME/q for unix-based systems, and c:\q for Windows. To run the system, see instructions in the next section, 2. Q Language.
Startingkdbplus/introduction
1.7 GUI
kdb+ has only a console interface, but there are some GUIs: the most popular is Charlie Skelton's studio for kdb+, a cross-platform execution environment. It is worth having this available even if you use another interface. First Derivatives [8] offer their clients the qIDE development system Q and K Development Tools [9] has an eclipse plugin Q Insight Pad [10] is an IDE for Windows Qconsole is an IDE using GTK Prev: Table of Contents Next: 2. Q Language Table of Contents
References
[1] http:/ / kx. com/ documentation. php [2] http:/ / www. kx. com/ q/ d/ primer. htm [3] http:/ / www. thalesians. com/ finance/ index. php/ Knowledge_Base/ Databases/ Kdb [4] http:/ / www. listbox. com/ subscribe/ ?listname=k4 [5] http:/ / groups. google. com/ group/ personal-kdbplus [6] http:/ / www. kx. com/ q [7] http:/ / kx. com/ Developers/ software. php [8] http:/ / www. firstderivatives. com [9] http:/ / www. qkdt. org [10] http:/ / www. qinsightpad. com
Startingkdbplus/qlanguage
2.1 Overview
Q is the programming system for working with kdb+. This corresponds to SQL for traditional databases, but unlike SQL, q is a powerful programming language in its own right. Q is an interpreted language. Q expressions can be entered and executed in the q console, or loaded from a q script, which is a text file with extension .q. You need at least some familiarity with q to use kdb+. Try following the examples here in the q console interface. Also, ensure that you have the example files installed. The following wiki pages will also be useful: Function Reference Data Types System Commands Command Line Parameters
Startingkdbplus/qlanguage
2.2 Loading q
You load q by changing to the main q directory, then running the q executable. You should not just click the q executable from the explorer - this will load q but in the wrong directory. It is best to create a startup script which might do other preprocessing such as setting environment variables, see examples q.sh and q.bat in the start [1] directory. in Windows, enter in a command window: c: cd \q w32\q.exe or create a batch file with contents below that allows parameters to be passed to q: c: cd \q w32\q.exe %* in Linux/Mac, it is usual to call the q executable under rlwrap to support line recall and edit. In the console: ..$ cd ~/q ~/q$ rlwrap l32/q The following Linux shell script changes to the q directory, sets the appropriate directory for 32 or 64 bit, then loads q under rlwrap: #!/bin/bash cd ~/q if [ "x86_64" == uname -m ]; then p=l64; else p=l32; fi rlwrap $p/q "$@"
Startingkdbplus/qlanguage ... Command line parameters are given here. For example: ... q profile.q -p 5001 loads script profile.q at startup. This can in turn load other scripts. sets listening port to 5001 At any prompt, enter \\ to exit q.
2. If there is no suspension, then a single \ will toggle q and k modes: q)count each (1 2;"abc") 2 3 q)\ #:'(1 2;"abc") 2 3 \ q) / q expression for length of each list item / toggle to k mode / equivalent k expression / toggle back to q mode
3. If you change namespace, then the prompt includes the namespace, see namespace directory. For example: q)\d .h q.h)\d . q) / change to h namespace / change back to root namespace
Startingkdbplus/qlanguage
/ list of lists
The following has a function definition, where x represents the argument: q)f:{2 + 3 * x} q)f 5 17
Startingkdbplus/qlanguage q)f til 5 2 5 8 11 14 Q makes essential use of a symbol datatype: q)a:`toronto q)b:"toronto" q)count a 1 q)count b 7 q)a="o" `type q)b="o" 0101001b q)a~b 0b q)a~`$b 1b / symbol / character string
q)dict:`items`sales`prices!(items;sales;prices) / dictionary q)dict items | nut bolt cam cog sales | 6 8 0 3 prices| 10 20 15 20 q)tab:flip dict q)tab items sales prices -----------------nut 6 10 bolt 8 20 / table
The table created above is an ordinary variable in the q workspace, and could be written to disk. In general, you create kdb+ tables in memory and then write to disk. Since it is a table, you can use SQL-like query expressions on it: q)select from tab where prices < 20 items sales prices -----------------nut 6 10 cam 0 15
q)(sum sales*prices) % sum sales / calculate weighted average 16.47059 q)sales wavg prices / built-in verb: wavg 16.47059 q)sales , prices 6 8 0 3 10 20 15 20 q)sales ,' prices 6 10 8 20 0 15 3 20 Functions can apply to dictionaries and tables: / verb: , join lists / adverb: ' join lists in pairs
Startingkdbplus/qlanguage q)-2 # tab items sales prices -----------------cam 0 15 cog 3 20 Functions can be used within queries: q)select items,sales,prices,amount:sales*prices from tab items sales prices amount ------------------------nut 6 10 60 bolt 8 20 160 cam 0 15 0 cog 3 20 60
2.9 Scripts
A q script is a plain text file with extension .q, which contains q expressions that are executed when loaded. For example, load the script sp.q and display the "s" table that it defines: q)\l sp.q q)s s | --| s1| s2| s3| s4| s5| / load script / display table s name status city ------------------smith 20 london jones 10 paris blake 30 paris clark 20 london adams 30 athens
Within a script, a line that contains a single / starts a comment block. A line with a single \ ends the comment block, or if none, exits the script. A script can contain multi-line definitions. Any line that is indented is assumed to be a continuation of the previous line. Blank lines, superfluous blanks, and lines that are comments (begin with /) are ignored in determining this. For example, if a script has contents: a:1 2 / this is a comment line 3 + 4 b:"abc" Then loading this script would define a and b as: q)a 5 6 7
/ i.e. 1 2 3 + 4
10
2.10 Q Queries
Q queries are similar to SQL, though often much simpler: \l sp.q q)select from p where weight=17 p | name color weight city --| -----------------------p2| bolt green 17 paris p3| screw blue 17 rome SQL statements can be entered, if prefixed with s) q)s)select * from p where color in (red,green) p | name color weight city --| ------------------------p1| nut red 12 london p2| bolt green 17 paris p4| screw red 14 london p6| cog red 19 london The Q equivalent would be: q)select from p where color in `red`green Similarly, compare: q)select distinct p,s.city from sp s)select distinct sp.p,s.city from sp,s where sp.s=s.s and q)select from sp where s.city=p.city s)select sp.s,sp.p,sp.qty from s,p,sp where sp.s=s.s and sp.p=p.p and p.city=s.city Note that the dot notation in q automatically references the appropriate table. Q results can have lists in the rows: q)select qty by s from sp s | qty --| ----------------------s1| 300 200 400 200 100 400 s2| 300 400 s3| ,200 s4| 100 200 300 ungroup will flatten the result: / SQL query
Startingkdbplus/qlanguage q)ungroup select qty by s from sp s qty -----s1 300 s1 200 s1 400 s1 200 ... Calculations can be performed on the intermediate results: q)select countqty:count qty,sumqty:sum qty by p from sp p | countqty sumqty --| --------------p1| 2 600 p2| 4 1000 p3| 1 400 p4| 2 500 p5| 2 500 p6| 1 100 Prev: 1. Introduction Next: 3. Q IPC Table of Contents
11
References
[1] http:/ / code. kx. com/ wsvn/ code/ contrib/ cburke/ start
Startingkdbplus/ipc
12
Startingkdbplus/ipc
3. Q IPC
3.1 Overview
A production kdb+ system may have several q processes, possibly on several machines. These communicate via tcp/ip. Any q process can communicate with any other q process as long as it is accessible on the network and is listening for connections. a server process listens for connections and processes any requests a client process initiates the connection and sends commands to be executed Client and server can be on the same machine or on different machines. A process can be both a client and a server. A communication can be synchronous (wait for a result to be returned) or asynchronous (no wait and no result returned).
The function hopen starts a connection, and returns an integer connection handle. This handle is used for all subsequent client requests. For example: q)h:hopen `::5001 q)h "3?20" 1 12 9 q)hclose h
Startingkdbplus/ipc
13
3.4 Synchronous/Asynchronous
Where the connection handle is used as defined (it will be a positive integer), the client request is synchronous. In this case, the client waits for the result from the server before continuing execution. The result from the server is the result of the client request. Where the negative of the connection handle is used, the client request is asynchronous. In this case, the request is sent to the server, but the client does not wait or get a result from the server. This is done when a result is not required by the client. For example: q)h:hopen `::5001 q)(neg h) "a:3?20" q)(neg h) "a" q)h "a" 0 17 14
References
[1] http:/ / localhost:5001
Startingkdbplus/tables
14
Startingkdbplus/tables
4.1 Overview
A basic understanding of the internal structure of tables is needed to work with kdb+. The structure is actually quite simple, but very different from conventional databases. This section gives a quick overview, followed by an explanation of the sp.q script, and then a typical table for stock data. After completing this, you should read the page kdbplus database, which has a detailed comparison of kdb+ and conventional RDBMS. Kdb+ tables are created out of lists. A table with no key columns is essentially a list of column names associated with a list of corresponding column values, each of which is a list. A table with key columns is internally built from a pair of tables - the key columns associated with the non-key columns. Kdb+ tables are created in-memory, and then written to disk if required. When written to disk, smaller tables can be stored in a single file, while larger tables are usually partitioned in some way. The partitioning can be seen when viewing the file directories, but the table is treated as a single object within a q process.
The form for the second method, for a table with j primary keys and n columns in total, is: t:([c1:v1;...;cj:vj]cj+1:vj+1;...;cn:vn) Here table t is defined with column names ci, and corresponding values vi. The square brackets are for primary keys, and are required even if there are no primary keys.
Startingkdbplus/tables
15
Table s
Table s has a primary key column, also called s, given as a list of symbols which should be unique. Note that in this example, the name "s" is used both for the table and the primary key column, but this is not required. The remaining columns are of type symbol, integer, symbol. s:([s:`s1`s2`s3`s4`s5] name:`smith`jones`blake`clark`adams; status:20 10 30 20 30; city:`london`paris`paris`london`athens) Display in q: q)s s | --| s1| s2| s3| s4| s5|
name status city ------------------smith 20 london jones 10 paris blake 30 paris clark 20 london adams 30 athens
Note that the column types are set from the data given. If this were first created as an empty table, say table "t", then the column types could be defined explicitly as follows: q)t:([s:`$()]name:`$();status:"I"$();city:`$()) Insert a row: q)`t insert (`s1;`smith;20;`london) ,0 q)t s | name status city --| ------------------s1| smith 20 london
Table p
Table p is created much like table s. As before, the table name and primary key name are both the same: p:([p:`p1`p2`p3`p4`p5`p6] name:`nut`bolt`screw`screw`cam`cog; color:`red`green`blue`red`blue`red; weight:12 17 17 14 12 19; city:`london`paris`rome`london`paris`london) Display in q:
16
name color weight city ------------------------nut red 12 london bolt green 17 paris screw blue 17 rome screw red 14 london cam blue 12 paris cog red 19 london
Table sp
Table sp is defined with no primary key. Columns s and p reference tables s and p respectively as foreign keys. The syntax for specifying another table's primary key as a foreign key is: `tablename$data The definition of sp is: sp:([] s:`s$`s1`s1`s1`s1`s4`s1`s2`s2`s3`s4`s4`s1; p:`p$`p1`p2`p3`p4`p5`p6`p1`p2`p2`p2`p4`p5; qty:300 200 400 200 100 100 300 400 200 200 300 400) Display in q: q)sp s p qty --------s1 p1 300 s1 p2 200 s1 p3 400 s1 p4 200 s4 p5 100 ...
Startingkdbplus/tables date time sym price size cond --------------------------------------------2010.02.21 10:03:54.347 IBM 20.83 40000 N 2010.02.21 10:04:05.827 MSFT 88.75 2000 B The ? verb will generate random data: q)syms:`IBM`MSFT`UPS`BAC`AAPL q)tpd:100 / trades per day q)day:5 / number of days q)cnt:count syms / number of syms q)len:tpd*cnt*day / total number of trades q)date:2010.02.21+len?day q)time:"t"$raze (cnt*day)#enlist 09:30:00+15*til tpd q)time+:len?1000 q)sym:len?syms q)price:len?100e q)size:100*len?1000 q)cond:len?" ABCDENZ" q)`trades:0#trades / empty trades table q)`trades insert (date;time;sym;price;size;cond) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 .. q)trades:`date`time xasc trades / sort on time within date q)5#trades date time sym price size cond -----------------------------------------------2010.02.21 09:30:00.766 UPS 70.38 46900 A 2010.02.21 09:30:00.801 IBM 89.24799 58600 N 2010.02.21 09:30:00.942 UPS 38.4812 54600 A 2010.02.21 09:30:15.116 IBM 25.56824 55700 A 2010.02.21 09:30:15.224 MSFT 75.97006 23800 E -- Prev: 3. Q IPC Next: 5. Historical Database Table of Contents
17
Startingkdbplus/hdb
18
Startingkdbplus/hdb
5.1 Overview
A historical database (hdb) holds data before today, and its tables would be stored on disk as being much too large to fit in memory. Each new day's records would be added to the hdb at the end of day. Typically, large tables in the hdb (such as daily tick data) are stored splayed, i.e. each column is stored in its own file, see cookbook/splayed tables and kdb+formortals/splayed. Typically also, large tables are stored partitioned by date. Very large databases may be further partitioned into segments, using par.txt. These storage strategies give best efficiency for searching and retrieval. For example, the database can be written over several drives. Also, partitions can be allocated to slave threads so that queries over a range of dates can be run in parallel. The exact set up would be customized for each installation. For example, a simple partitioning scheme on a single disk might be as follows. Here, the daily and master tables are small enough to be written to single files, while the trade and quote tables are splayed and partitioned by date:
Startingkdbplus/hdb q)count quote 1709919j q)t:select from trade where date=last date, sym=`IBM q)count t 1041 q)5#t date time sym price size stop cond ex --------------------------------------------------2010.12.31 09:30:00.055 IBM 55.65 74 0 N N 2010.12.31 09:30:00.114 IBM 55.66 72 1 W N 2010.12.31 09:30:01.970 IBM 55.56 37 0 T N 2010.12.31 09:30:03.091 IBM 55.56 41 1 R N 2010.12.31 09:30:06.930 IBM 55.57 89 0 B N q)select count i by date from trade date | x ----------| ----2010.12.01| 14991 2010.12.02| 14705 2010.12.03| 14817 2010.12.06| 14877 ... q)select[5] cnt:count i,sum size,last price, wprice:size wavg price by 5 xbar time.minute from t minute| cnt size price wprice ------| ----------------------09:30 | 42 2138 55.24 55.37768 09:35 | 22 1329 55.32 55.35988 09:40 | 23 1279 55.2 55.25091 09:45 | 16 716 54.99 55.13633 09:50 | 24 1187 54.82 54.83702 q)select[-5] open:first price,lo:min price,hi:max price, close:last price by 10 xbar time.minute from t minute| open lo hi close ------| ----------------------15:10 | 55.64 55.43 55.64 55.56 15:20 | 55.56 55.54 55.95 55.95 15:30 | 55.88 55.61 55.99 55.74 15:40 | 55.81 55.8 56.18 55.86 15:50 | 55.84 55.84 56.5 56.38
19
Startingkdbplus/hdb
20
Ensure that the directory given in dsp is created, writable and empty, then load the modified script, which should now take a minute or so. This should write the partioned data to subdirectories of dsp, and create a par.txt file like: /dbss/d0 /dbss/d1 /dbss/d2 /dbss/d3 /dbss/d4 Restart q, and load the segmented database: q)\l start/dbs q)(count quote), count trade 81258538 16248124j q)select cnt:count i,sum size,size wavg price from trade where date in 2007.11.19+til 5, sym=`IBM cnt size price -------------------4213 227283 47.12981 Prev: 4. Kdb+ Tables Next: 6. Realtime Database Table of Contents
Startingkdbplus/rdb
21
Startingkdbplus/rdb
6.1 Overview
A real-time database (rdb) stores today's data. Typically, it would be stored in memory during the day, and written out to the historical database (hdb) at end of day. Storing the rdb in memory results in extremely fast update and query performance. As a minimum, it is recommended to have RAM of at least 4 times expected data size, so for 5 GB data per day, the rdb machine should have at least 20 GB RAM. In practice, much larger RAM might be used.
6.3 Tickerplant
The data feed could be written directly to the rdb. More often, it is written to a q process called a tickerplant, which may run several actions whenever data is received, for example: write all incoming records to a log file push all data to the rdb push all or subsets of the data to other processes run any other q code that should be executed as new data arrives
Other processes would subscribe to a tickerplant to receive new data, and each would specify what data should be sent (all or a selection). The kdb+tick [1] product from Kx is a tickerplant that is recommended for production systems with large volumes of real time data.
6.4 Example
The scripts in start/tick [2] run a simple tickerplant/rdb configuration. Note that they are not suitable for production use (no logging, error handling, end of day roll over etc). The layout is: feed | tickerplant / / | \ \ \ rdb vwap hlcv tq last show /\ /\ /\ /\ /\ ... client applications ... Here: feed is a demo feedhandler, that generates random trades and quotes and sends them to the tickerplant. In practice, this would be replaced by real feedhandlers. The tickerplant gets data from feed and pushes it to clients that have subscribed. Once the data is written, it is discarded.
Startingkdbplus/rdb The rdb, vwap, hlcv, tq and last processes are databases that have subscribed to the tickerplant. Note that these databases can be queried by a client application. rdb has all of today's data vwap has volume weighted averages for selected stocks hlcv has high, low, close, volume for selected stocks tq has a trade and quote table for selected stocks. Each row is a trade joined with the most recent quote. last has the last entries for each stock in the trade and quote tables
22
The show process displays the incoming feed for selected stocks. Note that all the client processes load the same script file cx.q, with a parameter that selects the corresponding code for the process in that file. Alternatively, each process could load its own script file, but since the definitions tend to be very short, it is convenient to use a single script for all. See c.q [3] for more examples (written for kdb+tick).
Startingkdbplus/rdb
23
References
[1] http:/ / kx. com/ kdb+ tick. php [2] http:/ / code. kx. com/ wsvn/ code/ contrib/ cburke/ start/ tick [3] http:/ / kx. com/ q/ tick/ c. q
24
25
License
26
License
terms and conditions TermsAndConditions