Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract:
FIX is the de-facto standard protocol used extensively for electronic communication between buy and sellside and execution venues, where the performance requirements of algorithmic and high frequency trading are extreme and or the benefits of STP (straight through processing) are sought from electronic connectivity .
January 2012
W ITH THANKS TO THE TEAM AT INTEL FASTERLAB UK
STATEMENT OF CONFIDENTIALITY / DISCLAIMER This document has been prepared by the consortium of companies described herein. No part of this document shall be reproduced without the consultation of these parties and acknowledgement of its source. Contact can be made to FIX@onx.com.
TABLE OF CONTENTS
1.
SUMMARY ............................................................................................................................................................... 1
2.
INTRODUCTION .................................................................................................................................................... 2
2.1 PURPOSE ........................................................................................................................................................................ 2 2.2 ROLES AND RESPONSIBILITIES ............................................................................................................................... 2
2.3 CONDUCT AND PRESENTATION .............................................................................................................................. 3
3. METHOD .................................................................................................................................................................. 4
3.1 TEST HARNESS SOFTWARE DESIGN ................................................................................................................... 4 3.2
TEST HARNESS - HARDWARE DESIGN ................................................................................................................... 5 3.3 THE MESSAGE PASSING PROCESS.. 7 3.4
TIMINGS ......................................................................................................................................................................... 8
3.5 POST-TEST DATA PROCESSING ............................................................................................................................... 8
3.6 TEST SCENARIOS ......................................................................................................................................................... 9
5. DISCUSSION .......................................................................................................................................................... 16
5.1 VALUE OF THE EXERCISE TO THE ELECTRONIC FINANCIAL TRADING COMMUNITY .......................... 16
5.2 PERFORMANCE OF THE TEST RIG......................................................................................................................... 16
5.3 RAISING THE TEST RIG TO PRODUCTION STANDARD .................................................................................... 17
5.4 EXPLOITING THE RESULTS..................................................................................................................................... 19
January 2012
1. Summary
This briefing paper reports on the activity of a consortium of leading IT vendors that have joined forces to create demonstrable high performance solution stacks to address common business requirements in financial trading. The initial focus of the consortium is on a reference-able technology stack of products and services to support FIX protocol communication functions. The paper describes the test environment, documents a set of benchmark tests performed on both commercial and open source FIX engine offerings, and details and interprets the representative latency and throughput figures achieved. The objective is to create transparency in and capability around comparing performance statistics for key functions along the trading life cycle. The tests used business workloads and were deliberately aligned to reflect the markets current interest in the measurement of interparty latency across the trade life cycle using FIX formatted messages for defined legs. An on-going objective is to provide the market with useful data in order to support decisions in technology investment. Therefore, a range of technologies and application software has been addressed. Approaches were made to a number of application vendors with the ultimate agreement to test FIX engines covering both C++ and Java implementations from EPAM Systems B2BITS unit and Rapid Addition, respectively. As a datum point for comparison, the open source QuickFIX, in both its C++ and Java variants was used. OnX Enterprise Solutions Ltd is leading a consortium whose charter members include Intel, Dell, Arista Networks and Solarflare Communications, with additional services provided by Edge Technology Group, GreySpark Partners and Equinix. The foundation objective is to create transparent comparative performance statistics for key functions along the trading life cycles using business workloads FIX being used on a number of legs of the typical trade life cycle. A series of tests were undertaken that demonstrate the value of commercial software (versus open source) and use of specialist technologies in a low latency infrastructure. The consortium approach recognises the reality that the creation of high performance solutions requires the interaction of many leading edge technologies and the integration of components from several vendors. These parties must work together in order to specify correct parts and then to tune them together such that a complete and reliable solution is available through a collective single channel. Results for the tests showed that both B2BITS and Rapid Additions commercial FIX engines out performed the open source QuickFIX offerings (C++ and Java) in a range of tests, being between 4 and 16 times faster in generating messages during a standardised simulated trade. The average latency for the commercial engines was 11 to 12 microseconds, whereas the open source engines were between 45 and 180 microseconds. The variation in results was equally stark, since the frequency distribution results from the commercial engines were bell curved but the open source results had a long fat tail. This indicated the commercial solutions significantly reduced the effect of network jitter and with it the undesired variance of performance. B2BITS FIX Antenna engine was a C++ version; Rapid Additions Cheetah engine was Java. Both demonstrated similar performance characteristics over a range of tests and workloads. The similarity of results between the commercial C++ and Java engines stood in contrast to the open source equivalents, demonstrating that Java can perform as well as C++ code when implemented in an optimised fashion.
January 2012
2. Introduction
In the online and Co-Lo based financial trading markets, performance, both in terms of latency and throughput is paramount. It is the difference between a firm being in the market or not. Complete trading systems are built from many complex elements, including market data capture, trading algorithms, trade execution, and in-flow risk analysis. These elements run on critical infrastructure components, hardware, software, network and connectivity all of which must interoperate with each other. Today, there is a lack of industry-recognised benchmarks for designers, which can demonstrate solutions have high performance characteristics. To achieve performance and agility, with low up-front and ongoing operating costs, trade infrastructure implementation teams need to source the best available components from different innovative specialist vendors, integrate them and tune their interoperability. FIX is the de-facto standard protocol used extensively for electronic communication between buy and sell-side and execution venues, where the performance requirements of algorithmic and high frequency trading are extreme and or the benefits of STP (straight through processing) are sought from electronic connectivity.
2.1 Purpose
FIX message generation is an increasingly important leg in automated trading and can be a source of significant latency and jitter which can adversely impact the success of business and trading strategies. As trading strategies require access to a greater diversity of execution venues, communication over the standard FIX protocol is more cost effective than accessing markets via diverse proprietary protocols at the various venues. Infrastructure deployment teams have to select appropriate components, integrating them, commissioning them, deploying them for maximum performance, which can be an extreme challenge, It requires a combination of knowledge, skills, experience and deployment ability that is today scarce and expensive in the market. The testing undertaken by OnX in the Intel lab with support from the consortium was to investigate these assertions: 1. 2. Using commercial FIX engines would achieve lower latency and less jitter. Using specialist low latency network techniques would have a significant impact on latency.
Full results for each environment and latency improvement are available on request.
January 2012
The benchmarks were conducted at Intels fasterLAB in the UK. Intel engineers screened hardware and software performance for optimization. Intel engineers performed the tests, recorded the results and provided post test process to produce average tables and graph outputs. Software suppliers Rapid Addition and B2BITS EPAM Systems provided their FIX engines. The test harness was designed by Rapid Addition and the open source implementations in Java and C++ were supplied by Rapid Addition and B2BITS respectively.
January 2012
January 2012
January 2012
The test harness server housed the algorithmic trading system simulator and FIX engine. This server included a single Intel Xeon processor X5698 (dual-core), clocked at 4.4 GHz, with 12MB of L3 cache and 96GB of RAM (12 x 8GB); running Red Hat Enterprise Linux (RHEL 6.0). This processor has been designed based on feedback directly from Intels teams in the field close to financial trading for applications where the fastest single-thread instruction execution is required. Performance increases of more than 20% compared to other Intel Xeon processor X5600 Series from Intel were noted. Preliminary tests were undertaken to select the most appropriate processor for the workload by comparing the Intel Xeon processor X5698 (4.4GHz) against the Intel Xeon processor X5680 (3.33GHz) . The preliminary test, using a message rate of 100,000 messages a second showed the Intel Xeon processor X5698 to have 36% better latency performance than the Intel Xeon processor X5680. The speed difference between the processors was 32%, indicating the Intel Xeon processor X5698 was exhibiting better linear scalability under test and was better suited to the FIX engine workload. On the basis of this preliminary test, the Intel Xeon processor X5698 was selected for the test environment.
Low latency network interface cards (NICs) were selected for all servers. With a non-specialized network card, latencies are around 20 microseconds. Empirical evidence from Solarflare Communications indicates that this can be reduced by 50% by using a specialized low latency network card and by a further 50% using a technique referred to as kernel bypass. Solarflare is a recognized provider of low latency NICs offering kernel bypass support for both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic. Typically, market data is broadcast using stateless UDP and trade execution uses TCP. The model selected was the dual interface SFN5122-F.
MD Simulator
Algo simulator
Hand up message
Start
35=X
Does MD Entry Price end .00? No Yes Is this a bid (269=0)? Invoke Create Order Single class to buy stock Invoke Create Order Single class to sell stock
Session-1
No
Create 35=D
Yes
End
Order Single
Yes
Hand up message
No
2. The FIX engine listened to this stream of messages on a single FIX session (Session-1) and hands each message up to the algorithmic trading simulator. 3. The algorithm simulator interrogated the data and when a bid (tag 269=0) had a MD Entry Price that ends in ".000 (e.g. 270=56.000) it instructed the FIX engine to create and send a new Order Single (tag 35=D) message to buy 100 lots (tag 38=100) of the symbol (tag 55) to an execution venue simulator on a second FIX session (Session-2). Since each market data message had a unique price, market data messages could be correlated with the order messages that they triggered. 4. The execution venue simulator automatically filled the order by creating two Execution Reports (tag 35=8). The first had an Order Status of New (tag 39=0); the second, Filled (tag 39=2). These were returned on the same FIX session (Session-2). 5. On receipt of the fill (tag 35=8; tag 39=2) the algo simulator instructed the FIX engine to send another Order Single (tag35=D) to sell 100 lots (tag 38=100) the same symbol (tag 55). 6. Again, the execution venue simulator automatically filled the order by creating two Execution Reports (tag 35=8). The first will have an Order Status of New (tag 39=0); the second, Filled (tag 39=2). 7. Note: Tests were performed without use of persistent storage.
3.4 Timings
Since timestamps within the test harness hardware components lacked sufficient accuracy to the microsecond, timings were recorded on an Endace network monitor. Three timestamps were recorded for each benchmark process: 1. Receipt of the market data message from the market data simulator (T1).
2. Transmission of each of the 2 single order messages to the execution venue simulator (T2). 3. Receipt of confirmation of the execution of T1 from the execution venue simulator (T3).
January 2012
10
January 2012
The two graphs above show the latency of the workload completion over a 300 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively.
11
January 2012
The two graphs above show the same results over a 60 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively.
12
January 2012
The two graphs above show the latency of the workload completion over a 300 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively. Note the absence of performance test results from the open source C++ engine under these test conditions.
13
January 2012
The two graphs above show the same results over a 60 microsecond range, comparing open source against the commercially available Java and C++ FIX engines, respectively. 1. The commercial FIX engines completed the messaging tasks between 30 and 50 microseconds more quickly than the QuickFIX engines. 2. The QuickFIX engines had outlying results to 300 microseconds (they did not complete their task inside this time), a source of jitter (unpredictability). 3. QuickFIX C++ was unable to perform with the Exchange simulator set at 14 Microseconds. 4. Across the range of tests, each commercial engine exhibited different characteristics, with differences in outright latency and jitter, which showed no common theme as to performance characteristics and are hence considered to be within experimental error. This assertion is demonstrated when examining the whole result set. 5. Open source/free Java and C++ QuickFIX engines show random variation between themselves C++ version could not perform at the 14 microsecond load level. 6. The commercial FIX engines were consistent and deterministic throughout the tests. The commercial engines showed a normal distribution pattern and calculations of standard deviation were undertaken. The results for the QuickFIX engines showed a large number of outlying results (which translates to poor reliability in handling trading workloads) and did not fit the normal distribution model.
14 January 2012
The commercial FIX engines showed a much tighter distribution range of 4 microseconds, as opposed to 50 microseconds. The sample run below illustrates the point. Note the difference in microsecond range on the X axis of each graph below.
Number Of Samples
Time in s
Number Of Samples
Time in s
15
January 2012
5. Discussion 5.1 Value of the Exercise to the Electronic Financial Trading Community
The testing exercise has illuminated the debate by practitioners who look to quantify the benefits of commercial FIX engines over their open source counterparts. It is clear that the commercial engines outperform open source versions by an order of magnitude and also have significantly higher consistency in performance, an essential feature for the execution of certain trading strategies. While the open source model is widely successful as a driver for innovation, in the case of FIX it is clearly important to select software products based on the required workload and performance characteristics. The Java based FIX engine closely matched the native C++ code with each engine showing individual characteristics. Finally, the exercise has demonstrated the value of optimized high performance infrastructure when deploying automated electronic trading systems.
16
January 2012
Project governance is required across the implementation of a high performance trading infrastructure. This begins with an analysis of the current environment, whether it is a green field deployment or a complete replacement of existing systems. A critical component is to ensure that any new system can integrate effectively with existing systems (SOR, risk, market data, etc.). Across the financial services technology landscape, these skills and competencies are typically spread across multiple parties with differing and often overlapping areas of competence and responsibility. This can introduce variance in the effectiveness of the trading infrastructure, which can impact the overall effectiveness of the deployment project. The consortium has been assembled to create teaming amongst parties, who can carry the resource loads of planning and designing suitable infrastructure within the context of each firms current and ongoing environment. This approach can be equally applied whether the deployment is in-house or at a colocation facility in proximity to market liquidity.
17
January 2012
18
January 2012
MLAG pair
MLAG pair
19
January 2012
6. Conclusion
The major result from the testing exercise was the collaboration between parties to create a robust and representative testing environment, which was able to produce results simulating real-life conditions and their effect on the key function of FIX message transmission. The commercial FIX engines were between 4 and 16 times faster (depending on load) than the open source QuickFIX equivalent engines, with an average latency test result of 11 microseconds, as opposed to 180 microseconds. This was even more evident when the performance of the execution venue was increased to reflect faster matching (sub 50 microseconds). Stress exerted on the FIX engine drew out different performance characteristics. Under different stress conditions, each engine exhibited different characteristics. The commercial engines performance was vastly superior to that of the open source models. The standard deviation from the mean for a commercial engine was only 1 microsecond. The open source software exhibited results which when translated into the real production world would not be considered sufficiently robust to support automated trading strategies. The major factors affecting open source variants are poor performance under high load, higher levels of network jitter and trade execution outliers up to 300 microseconds. Tuning the Network Interface Cards with kernel bypass technology improved the performance of both commercial engines and demonstrated a 50% reduction in latency. This translated into a round-trip saving in latency, which would have material impact on the trading strategy being executed. Engineering an integrated trading platform was proven to deliver incremental benefits in reducing overall latency. Both Java and C++ environments in open source and commercial form exhibited individual characteristics across the various code streams in the applications. This indicates the on-going scope for improvement in the software, which can lead to improvements in overall performance. The test results demonstrated that trading strategies which rely on minimising response times should be deployed on a high performance infrastructure. This is integral in obtaining enhanced levels of performance and reliability. Each layer in the technology stack has a role to play with incremental enhancements being possible when implementing options, such as kernel bypass.
20
January 2012
Appendices 1. Consortium
The consortium comprises a group of companies whose combined capability maps to the provision of trading technology solutions. This is not a closed group and fully is open to inputs from additional parties on an on-going basis.
1a Technology Members
A number of technology and services providers have invested as charter members of the consortium. However the initiative is open, and further participant members may be added in the future. Between them, these members provide a complete infrastructure capability and created the reference architecture, each drawing on specific expertise, while OnX provided the integration and build capability. The charter group members with technology product directly involved building the rig and in the performance benchmark testing comprise: OnX Enterprise Solutions As consortium lead, OnX selected vendors for the benchmark test stack, built the test rig by integrating the product components and interpreted the results of the tests. Arista Networks Provided its 7124SX network switch to connect servers for the benchmark and its LANZ (Latency Analyzer) capability for tuning. Dell The benchmark was run on two Dell PowerEdge R710 servers, one of which was equipped with an Intel Xeon processor X5698 . Intel The benchmarks were conducted at Intels fasterLAB in the UK. Intel Xeon processors X5677 and X5698 were installed in the Dell servers. Intel engineers screened hardware and software performance for optimum utilisation of iA (Intel Architecture) features, including use of Intel Compiler . Solarflare Communications SFN5122F 10 gigabit Ethernet network adaptors were installed in each of the Dell servers, offering kernel-bypass communications.
21
January 2012
Other consortium members, which can provide services for deployment in real life production scenarios be they Co-Lo, onsite or other include:
Edge Technology Group Provides integration and managed services, in particular for buyside participants. Equinix run financial services data centres around the globe supporting high-performance trading across multiple-asset classes on a deep mix of trading venues. Trading participants are connected inside the data centre using cross-connects to reduce network latency delay and enable price discovery, order routing and execution at the highest possible performance levels. GreySpark Partners Provides top down trading strategy and technology consulting, and integration services, with a focus on assessing requirements and designing technology bundles for high performance.
22
January 2012