Low Latency Best Practices
Hardware and Middleware
One of the most efficient ways to reduce the latency is to use specialized network cards (e.g., Solarflare) and user-space TCP stack implementations (e.g., Onload).
See Also
Selecting Session Storage
Using MemoryBasedStorage instead of FileBasedStorage boosts performance since SBE messages are stored directly in memory.
Alternatively, it's possible to use pluggable storage (PluggableStorage) that does nothing on message-related operations.
You can also use the Asynchronous File-Based Session Storage if you need to keep the file-based storage functionality and excellent performance.
Threads Tuning
Affinity
By default, session threads can be executed on any of the available processors/cores. Specifying CPU affinity for each thread may give a significant performance boost:
int[] receivingCpuIndexes = new int[] { 1 };
int[] sendingCpuIndexes = new int[] { 2 };
session.ReceivingThreadAffinity = receivingCpuIndexes;
// If the message cannot be sent immediately, then it is saved to the queue
// for the subsequent sending by the sending thread.
session.SendingThreadAffinity = sendingCpuIndexes;
Note
Ideally, each spinning thread should run on a separate CPU core so that it will not stop other important threads from doing work if it blocks or is de-scheduled. If more than one spinning thread shares the same CPU core, it could significantly increase jitter.
Priority
To modify threads priorities, use SendingThreadPriority, and ReceivingThreadPriority properties.
Spinning (busy‐wait)
Note
The user-space TCP stack spinning is usually more efficient than the built-in spinning (e.g., Onload's
EF_POLL_USEC
environment variable or the latency‐best
profile).
Warmup
If the session sends SBE messages infrequently, the sending path and associated data structures will not be in a CPU cache, and this can increase the message sending latency.
One can periodically (a recommended interval is 500
microseconds or less) call
the WarmUp(IMessage, SocketFlags)
to avoid cache misses and keep the sending path fast.
Receive Spinning Timeout
When a session receiving thread attempts to read from a network and no incoming messages are available, the thread will enter the OS kernel and block (so-called "blocking wait" mode). When an incoming message becomes available, the network adapter will interrupt the CPU, allowing the OS kernel to reschedule the thread to continue.
Blocking, interrupts, and thread context switches are relatively expensive operations and can negatively affect the latency.
The session can be configured to spin on the processor in user mode for up to a specified number of microseconds waiting for messages from the network using the ReceiveSpinningTimeout property. If the spin period expires, the session will revert to normal blocking behavior.
ReceiveSpinningTimeout usage makes sense when the session receives SBE messages frequently; in this case, waiting in the loop is cheaper than the thread context switch to the "blocking wait" mode.
Note
The spin wait increases the CPU usage, so the spin wait period should not be too long.
Send Spinning Timeout
The SendSpinningTimeout property can be used to decrease the latency of the message sending.
If the value is zero (by default) and the outgoing message cannot be sent immediately, it is saved to the outgoing queue. If the value greater than zero, the Send(IMessage) method waits for the socket sending buffer availability in the spin loop mode before placing the message to the outgoing queue (to be sent later by the sending thread).
SendSpinningTimeout usage makes sense when the session sends SBE messages frequently; in this case, waiting in the loop is cheaper than the thread context switch.
Note
The spin wait increases the CPU usage, so the spin wait period should not be too long.
Logging After Sending
By default, the logging of an outgoing message to the session storage is performed before sending it to the wire. This approach is more reliable because the outgoing message is stored before going to the counterparty. However, this approach adds the logging latency to the message sending latency, so it increases the tick-to-trade latency.
When the latency is more important, one can switch off the logging before sending, by setting
the LogBeforeSending option to false
.
In this case, the logging of outgoing messages to the session storage will be performed after sending them to the wire.
Reusing Message Instances
Object creation is an expensive operation, with impact on both performance and memory consumption. The cost varies depending on the amount of initialization that needs to be performed when the object is to be created. The Session exposes an ability to reuse message instances and event arguments in event handlers by the Session. We highly recommend to turn on xref:OnixS.Cme.ILink3.Session.ReuseInboundMessage and xref:OnixS.Cme.ILink3.Session.ReuseEventArguments to minimize the excess object creation and garbage collection overhead.
Note
If ReuseInboundMessage turns on, the client's code must copy a message for using outside of inbound callbacks. If ReuseEventArguments turns on, the client's code must copy event arguments for using out of callbacks.