Given topic uncovers how to configure Handler in order to achieve maximal performance characteristics and lowest processing latency.
Under normal conditions, Handler logs important events and market data transmitted by MDP into log file. As far as log file uses text-based format, binary data like incoming market data is base64-encoded before stored in log. That adds extra time into processing cycle. Finally, logging is file-based and thus is relatively slow.
To eliminate slowdowns caused by flushing data to filesystem and/or extra encoding operations, logging subsystem should be disabled by setting OnixS::CME::MarketData::HandlerSettings::logMode parameter to OnixS::CME::MarketData::LogModes::Disabled value.
Currently, Handler provides a couple strategies affecting the way Handler receives and processes received market data: Direct and Buffered. Primary difference between two strategies consists in using additional thread which grabs incoming market data while another thread is busy with processing of previously received market data. All threads use same processing loop, therefore there's no direct impact of using extra thread onto processing performance. However, final decision on packet pulling strategy is to be done basing on final application architecture. Using multiple threads and/or multiple instances of Handler add extra context switching and thus may have negative effect onto latency. For this reason general recommendation is to use Direct packets pulling strategy:
Another drawback of using Buffered mode consists in on-demand allocation of resources to keep incoming market data. Basically, Handler uses pool of packets which are reused while processing market data. However, in case of data burst, additional packets may be allocated to present Handler from loosing data. In case of Direct mode, allocation of additional packets doesn't happen.
Market data processing is done asynchronously by using threads (number of which depends on packet pulling strategy;
To avoid switching threads between processors, Handler allows to establish processor affinity for each working thread:
Since the recent releases, in addition to ability of defining processor affinity for working threads, Handler provides new set of events triggered by working threads at the beginning of processing and before ending processing.
With help of working threads events, it's possible to perform more advanced threads turning like updating thread priority:
When Handler accomplishes processing of previously received market data, it initiates reception of new incoming data if it's available in the feed. In reality, data packets do not come each after another, but there's a certain time interval between two neighbour packets. Pauses between incoming packets causes Handler to suspend activities and sleep in kernel while waiting for incoming data. As a result, data and execution code can be ejected from processor's cache. That brings to the fact next iteration of processing loop will be performed slower in compare to previous one. To have an influence on the amount of time Handler waits in kernel on incoming data, Handler provides a new parameter OnixS::CME::MarketData::HandlerSettings::ioCompletionWaitTime, whose value defines time Handler spends in I/O operation while waiting for incoming data. Reducing value of noted parameter increases number of wake-ups and thus reduces probability for Handler to be thrown out of processor's cache. If parameter is set to zero, Handler just checks for data available, but does not enter kernel for a sleep. The drawback of reducing waiting time is an increase of processor consumption.
Under normal conditions, Handler effectively utilizes internal structures used to keep incoming market data. Packets and FIX messages are re-used once contained data is processed by Handler. Therefore, no data is allocated during real-time market data processing.
However, data may be copied within callbacks Handler invokes as listeners to various market data events. Thus, when book or FIX message is copied, that invokes memory allocation and thus has negative effect onto performance and latency. To improve the results, coping should be minimized or strategies with preallocation should be used.
For example, book snapshots can be constructed of certain capacity capable to store particular number of price levels. Constructing book snapshots with initial capacity sufficient to hold books of maximal possible depth will eliminate reallocations.