OnixS C++ CME iLink 3 Binary Order Entry Handler  1.18.9
API Documentation
Low Latency Best Practices

Hardware and Middleware

One of the most efficient ways to reduce the latency is to use specialized network cards (e.g., Solarflare), and user-space TCP stack implementations (e.g., Onload).

See also
Understanding Send Latency, Solarflare-specific Features.

Selecting Session Storage

Using OnixS::CME::iLink3::SessionStorageType::MemoryBased instead of OnixS::CME::iLink3::SessionStorageType::FileBased boosts performance, since SBE messages are stored directly in memory.

Alternatively, it's possible to use pluggable storage ( OnixS::CME::iLink3::SessionStorageType::Pluggable ) that does nothing on message-related operations.

You can also use the Asynchronous File-Based Session Storage if you need to keep the file-based storage functionality and excellent performance.

Threads Tuning

Affinity

By default, session threads can be executed on any of the available processors/cores. Specifying CPU affinity for each thread may give a significant performance boost:

CpuIndexes receivingCpuIndexes;
CpuIndexes sendingCpuIndexes;
receivingCpuIndexes.insert(1);
sendingCpuIndexes.insert(2);
// The Handler tries to send an outgoing application-level message in the context
// of the thread that calls the OnixS::CME::iLink3::Session::send method.
Threading::ThisThread::affinity(sendingCpuIndexes);
session.receivingThreadAffinity(receivingCpuIndexes);
// If the message cannot be sent immediately, then it is saved to the queue
// for the subsequent sending by the sending thread.
session.sendingThreadAffinity(sendingCpuIndexes);
Note
Ideally, each spinning thread should run on a separate CPU core so that it will not stop other important threads from doing work if it blocks or is de-scheduled.
If more than one spinning thread shares the same CPU core, it could significantly increase jitter.

Priority

To modify threads prioritis, use the OnixS::CME::iLink3::Threading::ThisThread::priority, OnixS::CME::iLink3::Session::receivingThreadPriority, and OnixS::CME::iLink3::Session::sendingThreadPriority methods.

Scheduling Policy

To modify threads scheduling policies, use the OnixS::CME::iLink3::Threading::ThisThread::policy, OnixS::CME::iLink3::Session::receivingThreadPolicy, and OnixS::CME::iLink3::Session::sendingThreadPolicy methods.

Note
These methods are supported on Linux only.
SCHED_FIFO and SCHED_RR scheduling policies implement the fixed-priority real-time scheduling, so threads with these policies preempt every other thread, which can go into starvation.

Spinning (busy-wait)

Note
The user-space TCP stack spinning is usually more efficient than the built-in spinning (e.g., Onload's EF_POLL_USEC environment variable or the latency-best profile).

Receive Spinning Timeout

When a session receiving thread attempts to read from a network and no incoming messages are available, the thread will enter the OS kernel and block (so-called "blocking wait" mode). When an incoming message becomes available, the network adapter will interrupt the CPU, allowing the OS kernel to reschedule the thread to continue.

Blocking, interrupts, and thread context switches are relatively expensive operations and can negatively affect the latency.

The session can be configured to spin on the processor in user mode for up to a specified number of microseconds waiting for messages from the network using the OnixS::CME::iLink3::Session::receiveSpinningTimeout method. If the spin period expires, the session will revert to normal blocking behavior.

OnixS::CME::iLink3::Session::receiveSpinningTimeout usage makes sense when the session receives SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch to the "blocking wait" mode.

Note
The spin wait increases the CPU usage, so the spin wait period should not be too long.

Send Spinning Timeout

The OnixS::CME::iLink3::Session::sendSpinningTimeout method can be used to decrease the latency of the message sending.

If the value is zero (by default) and the outgoing message cannot be sent immediately, it is saved to the outgoing queue. If the value greater than zero, the OnixS::CME::iLink3::Session::send method waits for the socket sending buffer availability in the spin loop mode before placing the message to the outgoing queue (to be sent later by the sending thread).

OnixS::CME::iLink3::Session::sendSpinningTimeout usage makes sense when the session sends SBE messages frequently, in this case, waiting in the loop is cheaper than the thread context switch.

Note
The spin wait increases the CPU usage, so the spin wait period should not be too long.

Using spinlock

The OnixS::CME::iLink3::Session::useSpinLock option can decrease sending and receiving latency using a spinlock mutex instead of the standard one.

Note
The spin wait increases the CPU usage. It is not recommended to be used with the ThreadPool threading model.
This parameter is ignored in the TCPDirect mode.

Warmup

If the session sends SBE messages infrequently, the sending path and associated data structures will not be in a cache, and this can increase the message sending latency.

One can periodically (a recommended interval is 500 microseconds or less) call the OnixS::CME::iLink3::Session::warmUp to avoid cache misses and keep the sending path fast.

Using session send batch

The OnixS::CME::iLink3::Session::send method can be used to send messages in a batch. It can decrease the send latency because all batch messages will be sent in one TCP packet. The OnixS::CME::iLink3::Messaging::MessageBatch class represents the message batch. This class instance should be created in advance on the non-critical path, and required messages should be added. After that, one can send the batch on the critical path:

#if defined (ONIXS_ILINK3_CXX11)
// Create the MessageBatch object on the non-critical path.
MessageBatch<NewOrderSingle514> batch;
// Add messages to the batch.
for(unsigned messageCounter = 0; messageCounter < 10; ++messageCounter)
batch.add(createOrder());
// Updates headers to be ready for sending.
batch.updateHeaders();
// Check the batch fits a single packet
assert(MessageBatchChecker::fitSize(batch));
// On the critical path one can update field values in the batch and send messages to the wire.
batch[0]->setOrderQty(10);
session.send(batch);
#else
// The batch send is not supported on this platform.
#endif

Also, there is the OnixS::CME::iLink3::Messaging::MessageBatchCombiner class, which can be used when one needs to send messages with different types in the batch. It can combine different OnixS::CME::iLink3::Messaging::MessageBatch or OnixS::CME::iLink3::Messaging::MessageHolder instances.

Warning
It does not copy combined messages and does not store them internally. Therefore, the lifetime of combined messages should be greater or equal to the lifetime of this class instance:
#if defined (ONIXS_ILINK3_CXX11)
// Create MessageBatch objects with different message types on the non-critical path.
MessageBatch<NewOrderSingle514> orderBatch;
MessageBatch<OrderCancelRequest516> orderCancelBatch;
// Add messages to batches.
for(unsigned messageCounter = 0; messageCounter < 10; ++messageCounter)
orderBatch.add(createOrder());
for(unsigned messageCounter = 0; messageCounter < 5; ++messageCounter)
orderCancelBatch.add(createOrderCancel());
// Updates headers to be ready for sending.
orderBatch.updateHeaders();
orderCancelBatch.updateHeaders();
// Combine batches with different message types.
MessageBatchCombiner combiner;
combiner.add(orderBatch);
combiner.add(orderCancelBatch);
// One can combine a single message also.
MessageHolder<NewOrderSingle514> order;
combiner.add(order);
// Check the batch fits a single packet
assert(MessageBatchChecker::fitSize(combiner));
// On the critical path one can update field values in batches and send all combined messages to the wire.
orderBatch[0]->setOrderQty(10);
orderCancelBatch[0]->setOrderId(123);
session.send(combiner);
#else
// The batch send is not supported on this platform.
#endif
Note
The OnixS::CME::iLink3::Session::messageGrouping setting conflicts with the usage of the session send batch and can affect it. For the batch sending, the session uses the "gathering" socket send system call, which accepts an array of buffers to send all of them at once. When one sets the OnixS::CME::iLink3::Session::messageGrouping setting, the session tries to combine (copy) outgoing messages (when available) to the one outgoing buffer until the given grouping value is achieved and then uses the regular socket send system call for sending. Therefore, one should not use the OnixS::CME::iLink3::Session::messageGrouping option and batch sending simultaneously.

Logging After Sending

By default, the logging of an outgoing message to the session storage is performed before sending it to the wire. This approach is more reliable because the outgoing message is stored before going to the counterparty. However, this approach adds the logging latency to the message sending latency, so it increases the tick-to-trade latency.

When the latency is more important, one can switch off the logging before sending, by setting the OnixS::CME::iLink3::Session::logBeforeSending option to false. In this case, the logging of outgoing messages to the session storage will be performed after sending them to the wire.

Reusing Message Instances

A common strategy is to use the same outgoing application-level message instance multiple times.

Sending time

The iLink3 protocol requires the SendingTimeEpoch field to be filled with "the number of nanoseconds since midnight January 1, 1970" for each outgoing message. This is done automatically by the session when OnixS::CME::iLink3::Session::send is invoked.

The current time value is provided as an argument of OnixS::CME::iLink3::Session::send; this argument has the default value of the current system time being calculated at the invocation time.

The latency of taking system time can be avoided if it is requested before the call is made, for instance, at the warmup stage. Also, when the session is used with the OnixS C++ CME Market Data Handler, this value can be taken from the network packet receiving time.

See also