One of the most efficient ways to reduce the latency is to use specialized network cards (e.g., Solarflare), and user-space TCP stack implementations (e.g., Onload).
Using OnixS::B3::BOE::SessionStorageType::MemoryBased instead of OnixS::B3::BOE::SessionStorageType::FileBased boosts performance, since SBE messages are stored directly in memory.
Alternatively, it's possible to use pluggable storage ( OnixS::B3::BOE::SessionStorageType::Pluggable ) that does nothing on message-related operations.
You can also use the Asynchronous File-Based Session Storage if you need to keep the file-based storage functionality and excellent performance.
By default, session threads can be executed on any of the available processors/cores. Specifying CPU affinity for each thread may give a significant performance boost:
To modify threads prioritis, use the OnixS::B3::BOE::Threading::ThisThread::priority, OnixS::B3::BOE::Session::receivingThreadPriority, and OnixS::B3::BOE::Session::sendingThreadPriority methods.
To modify threads scheduling policies, use the OnixS::B3::BOE::Threading::ThisThread::policy, OnixS::B3::BOE::Session::receivingThreadPolicy, and OnixS::B3::BOE::Session::sendingThreadPolicy methods.
SCHED_FIFO
and SCHED_RR
scheduling policies implement the fixed-priority real-time scheduling, so threads with these policies preempt every other thread, which can go into starvation.EF_POLL_USEC
environment variable or the latency-best
profile).When a session receiving thread attempts to read from a network and no incoming messages are available, the thread will enter the OS kernel and block (so-called "blocking wait" mode). When an incoming message becomes available, the network adapter will interrupt the CPU, allowing the OS kernel to reschedule the thread to continue.
Blocking, interrupts, and thread context switches are relatively expensive operations and can negatively affect the latency.
The session can be configured to spin on the processor in user mode for up to a specified number of microseconds waiting for messages from the network using the OnixS::B3::BOE::Session::receiveSpinningTimeout method. If the spin period expires, the session will revert to normal blocking behavior.
OnixS::B3::BOE::Session::receiveSpinningTimeout usage makes sense when the session receives SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch to the "blocking wait" mode.
The OnixS::B3::BOE::Session::sendSpinningTimeout method can be used to decrease the latency of the message sending.
If the value is zero (by default) and the outgoing message cannot be sent immediately, it is saved to the outgoing queue. If the value greater than zero, the OnixS::B3::BOE::Session::send method waits for the socket sending buffer availability in the spin loop mode before placing the message to the outgoing queue (to be sent later by the sending thread).
OnixS::B3::BOE::Session::sendSpinningTimeout usage makes sense when the session sends SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch.
If the session sends SBE messages infrequently, the sending path and associated data structures will not be in a cache, and this can increase the message sending latency.
One can periodically (a recommended interval is 500
microseconds or less) call the OnixS::B3::BOE::Session::warmUp to avoid cache misses and keep the sending path fast.
The OnixS::B3::BOE::Session::send method can be used to send messages in a batch. It can decrease the send latency because all batch messages will be sent in one TCP packet. The OnixS::B3::BOE::Messaging::MessageBatch class represents the message batch. This class instance should be created in advance on the non-critical path, and required messages should be added. After that, one can send the batch on the critical path:
Also, there is the OnixS::B3::BOE::Messaging::MessageBatchCombiner class, which can be used when one needs to send messages with different types in the batch. It can combine different OnixS::B3::BOE::Messaging::MessageBatch or OnixS::B3::BOE::Messaging::MessageHolder instances.
send
system call, which accepts an array of buffers to send all of them at once. When one sets the OnixS::B3::BOE::Session::messageGrouping setting, the session tries to combine (copy) outgoing messages (when available) to the one outgoing buffer until the given grouping value is achieved and then uses the regular socket send
system call for sending. Therefore, one should not use the OnixS::B3::BOE::Session::messageGrouping option and batch sending simultaneously.By default, the logging of an outgoing message to the session storage is performed before sending it to the wire. This approach is more reliable because the outgoing message is stored before going to the counterparty. However, this approach adds the logging latency to the message sending latency, so it increases the tick-to-trade latency.
When the latency is more important, one can switch off the logging before sending, by setting the OnixS::B3::BOE::Session::logBeforeSending option to false
. In this case, the logging of outgoing messages to the session storage will be performed after sending them to the wire.
A common strategy is to use the same outgoing application-level message instance multiple times.
The BOE protocol requires the timestamp
field to be filled with "the number of nanoseconds since Unix epoch" for each outgoing message. This is done automatically by the session when OnixS::B3::BOE::Session::send is invoked.
The current time value is provided as an argument of OnixS::B3::BOE::Session::send; this argument has the default value of the current system time being calculated at the invocation time.
The latency of taking system time can be avoided if it is requested before the call is made, for instance, at the warm-up stage. Also, when the session is used with the OnixS C++ B3 Market Data Handler, this value can be taken from the network packet receiving time.