Inner Contents | |
Benchmarking Results | |
The given topic uncovers how to configure the Handler in order to achieve maximal performance characteristics and lowest processing latency.
Under normal conditions, the Handler logs important events and market data transmitted by MDP into a log file. As far as logging entries represent textual information, binary data like incoming market data packets are encoded using base64-encoding before stored in a log. That adds extra time to a processing cycle. Finally, if the Logger implementation stores its data into a file, that may be a relatively slow operation.
If the users want to eliminate slowdowns caused by flushing data to filesystem and/or extra encoding operations, they can disable logging by binding instance of the OnixS::CME::ConflatedUDP::NullLogger class to the Handler. In such case log events are not constructed by the Handler, and nothing is logged at all.
The Feed Engine concept was improved, and the thread-management layer was removed from the available implementations. The new OnixS::CME::ConflatedUDP::NetFeedEngine::process member was introduced to run the Feed Engine machinery explicitly. This improvement allows applications to perform their tasks more efficiently. In-place invocation of the OnixS::CME::ConflatedUDP::NetFeedEngine::process member allows avoiding using additional threads. Instead, it lets combining it with the other tasks like sending/receiving orders through an order management system used in combination with market data handling.
Below pseudo-code depicts the primary principle:
Suppose the application uses a thread-safe implementation of the Feed Engine and invokes its OnixS::CME::ConflatedUDP::NetFeedEngine::process member across multiple threads. In that case, threads participating in execution of the Feed Engine machinery may need additional turn-up. Under normal circumstances, threads are executed on any processor available in the system. That may have a negative influence on overall performance due to unnecessary thread context switching.
Establishing thread affinity for each of the working threads avoids or minimizes switching between processors. Suppose the application uses the OnixS::CME::ConflatedUDP::FeedEngineThreads class to run the Feed Engine machinery across multiple threads. In that case, affinity can be specified through the settings supplied at the instance construction:
In addition to establishing affinity for working threads, the OnixS::CME::ConflatedUDP::FeedEngineThreads also provides a set of events triggered by working threads at the beginning of the master loop and before ending a processing loop. See Multi-threaded Processing for more information.
With the help of working thread events, it is possible to perform more advanced thread turning like updating thread priority:
When the Handler accomplishes processing the previously received market data, it initiates receiving new data if it is available in the feed. In reality, data packets do not come each after another. Usually, there's a non-zero time interval between two neighbor packets. Pauses between incoming packets may cause a Feed Engine to sleep in a system kernel while waiting for incoming data. As a result, data and an execution code can be ejected from a processor's cache. That brings to the fact that the next iteration of the processing loop is performed slower than the previous one. The OnixS::CME::ConflatedUDP::SocketFeedEngine provides the OnixS::CME::ConflatedUDP::SocketFeedEngineSettings::dataWaitTime parameter, which defines the time the Feed Engine spends inside an I/O operation (like select
) while waiting for incoming data. Reducing the parameter's value increases the number of wake-ups and reduces the probability for the Feed Engine's data and code to be thrown out of a processor's cache. If the parameter is set to zero, the Feed Engine checks for data availability only but does not enter a system kernel to sleep. The drawback of reducing waiting time is a CPU consumption increase (up to 100% for zero parameter value).
Under normal conditions, the Handler effectively utilizes internal structures used to keep incoming market data. Packets and messages are re-used once the Handler processes the contained data. Therefore, no data is allocated during real-time market data processing.
However, data may be copied within callbacks, which the Handler invokes as listeners for various market data events. Thus, when a book is copied, that assumes memory allocation and thus harms performance and latency. Copying should be minimized. Alternatively, strategies with preallocation should be used to improve the results.
For example, book snapshots can be constructed of a particular capacity capable of storing a particular number of price levels. Constructing book snapshots with an initial capacity, sufficient to hold books of maximal possible depth eliminates reallocations.