OnixS C++ CME MDP Conflated TCP Handler  1.3.1
API Documentation
Low Latency Best Practices

Hardware and Middleware

One of the most efficient ways to reduce the latency is to use specialized network cards (e.g., Solarflare), and user-space TCP stack implementations (e.g., Onload).

See also
Understanding Send Latency, Solarflare Onload Features.

Selecting Session Storage

Using OnixS::CME::ConflatedTCP::SessionStorageType::MemoryBased instead of OnixS::CME::ConflatedTCP::SessionStorageType::FileBased boosts performance, since SBE messages are stored directly in memory.

Alternatively, it's possible to use pluggable storage ( OnixS::CME::ConflatedTCP::SessionStorageType::Pluggable ) that does nothing on message-related operations.

You can also use the Asynchronous File-Based Session Storage if you need to keep the file-based storage functionality and excellent performance.

Threads Tuning

Affinity

By default, session threads can be executed on any of the available processors/cores. Specifying CPU affinity for each thread may give a significant performance boost:

CpuIndexes receivingCpuIndexes;
CpuIndexes sendingCpuIndexes;
receivingCpuIndexes.insert(1);
sendingCpuIndexes.insert(2);
// The Handler tries to send an outgoing application-level message in the context
// of the thread that calls the OnixS::CME::ConflatedTCP::Session::send method.
Threading::ThisThread::affinity(sendingCpuIndexes);
session.receivingThreadAffinity(receivingCpuIndexes);
// If the message cannot be sent immediately, then it is saved to the queue
// for the subsequent sending by the sending thread.
session.sendingThreadAffinity(sendingCpuIndexes);
Note
Ideally, each spinning thread should run on a separate CPU core so that it will not stop other important threads from doing work if it blocks or is de-scheduled.
If more than one spinning thread shares the same CPU core, it could significantly increase jitter.

Priority

To modify threads prioritis, use the OnixS::CME::ConflatedTCP::Threading::ThisThread::priority, OnixS::CME::ConflatedTCP::Session::receivingThreadPriority, and OnixS::CME::ConflatedTCP::Session::sendingThreadPriority methods.

Scheduling Policy

To modify threads scheduling policies, use the OnixS::CME::ConflatedTCP::Threading::ThisThread::policy, OnixS::CME::ConflatedTCP::Session::receivingThreadPolicy, and OnixS::CME::ConflatedTCP::Session::sendingThreadPolicy methods.

Note
These methods are supported on Linux only.
SCHED_FIFO and SCHED_RR scheduling policies implement the fixed-priority real-time scheduling, so threads with these policies preempt every other thread, which can go into starvation.

Spinning (busy‐wait)

Note
The user-space TCP stack spinning is usually more efficient than the built-in spinning (e.g., Onload's EF_POLL_USEC environment variable or the latency‐best profile).

Receive Spinning Timeout

When a session receiving thread attempts to read from a network and no incoming messages are available, the thread will enter the OS kernel and block (so-called "blocking wait" mode). When an incoming message becomes available, the network adapter will interrupt the CPU, allowing the OS kernel to reschedule the thread to continue.

Blocking, interrupts, and thread context switches are relatively expensive operations and can negatively affect the latency.

The session can be configured to spin on the processor in user mode for up to a specified number of microseconds waiting for messages from the network using the OnixS::CME::ConflatedTCP::Session::receiveSpinningTimeout method. If the spin period expires, the session will revert to normal blocking behavior.

OnixS::CME::ConflatedTCP::Session::receiveSpinningTimeout usage makes sense when the session receives SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch to the "blocking wait" mode.

Note
The spin wait increases the CPU usage and blocks the thread that calls the OnixS::CME::ConflatedTCP::Session::send method, so the spin wait period should not be too long.

Send Spinning Timeout

The OnixS::CME::ConflatedTCP::Session::sendSpinningTimeout method can be used to decrease the latency of the message sending.

If the value is zero (by default) and the outgoing message cannot be sent immediately, it is saved to the outgoing queue. If the value greater than zero, the OnixS::CME::ConflatedTCP::Session::send method waits for the socket sending buffer availability in the spin loop mode before placing the message to the outgoing queue (to be sent later by the sending thread).

OnixS::CME::ConflatedTCP::Session::sendSpinningTimeout usage makes sense when the session sends SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch.

Note
The spin wait increases the CPU usage, so the spin wait period should not be too long.

Warmup

If the session sends SBE messages infrequently, the sending path and associated data structures will not be in a cache, and this can increase the message sending latency.

One can periodically (a recommended interval is 500 microseconds or less) call the OnixS::CME::ConflatedTCP::Session::warmUp to avoid cache misses and keep the sending path fast.

Logging After Sending

By default, the logging of an outgoing message to the session storage is performed before sending it to the wire. This approach is more reliable because the outgoing message is stored before going to the counterparty. However, this approach adds the logging latency to the message sending latency, so it increases the tick-to-trade latency.

When the latency is more important, one can switch off the logging before sending, by setting the OnixS::CME::ConflatedTCP::Session::logBeforeSending option to false. In this case, the logging of outgoing messages to the session storage will be performed after sending them to the wire.

Reusing Message Instances

A common strategy is to use the same outgoing application-level message instance multiple times.