Low Latency Best Practices

This section summarizes our findings and recommends best practices to tune the different layers of OnixS .NET Framework FIX Engine for similar latency-sensitive workloads. By latency-sensitive, we mean workloads that are looking at optimizing for a few microseconds to a few tens of microseconds end-to-end latencies; we don’t mean workloads in the hundreds of microseconds to tens of milliseconds end-to-end-latencies. In fact, many of the recommendations in this description, that can help with the microsecond level latency, can actually end up hurting the performance of applications that are tolerant of higher latency. Please note that the exact benefits and effects of each of these configuration choices will be highly dependent upon the specific applications and workloads, so we strongly recommend experimenting with the different configuration options with your workload before deploying them in a production environment.

Setting Process.PriorityClass property

A process priority class encompasses a range of thread priority levels. We recommend setting ProcessPriorityClass.RealTime. For more details, please see Process.PriorityClass Property.

Copy

Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;

Session tuning-up

Selecting right storage

Using MemoryBasedStorage instead of FileBasedStorage boosts performance since FIX messages are stored directly in memory. Alternatively, it's possible to use a custom pluggable storage (using PluggableStorage) which does nothing on FIX message-related operations as soon as no resend requests are to be supported. Also, you can use the Async File-based Session Storage if you need to keep the file-based storage functionality and good performance.

Manipulating Threads Affinity

By default, threads are used by FIX session to send and receive FIX messages. They can be executed on any of the available processors/cores. Specifying CPU affinity for each session thread may give a significant performance boost: ReceivingThreadAffinity, SendingThreadAffinity

The typical approach is to set different CPU affinities for each thread. However, when the number of threads is greater than the CPU cores, one can use the following approach:

Each high-loaded session that receives/sends messages at a high rate should be pinned to a separate CPU core.
All other less-loaded sessions can be pinned to the same CPU core(s).

Also, actual performance benefits for the affinity depend on the particular system and application architecture, so it makes sense to try different settings to find optimal ones.

Using receive spinning timeout

ReceiveSpinningTimeoutUsec property can be used to decrease the latency of the data receiving. If the value is zero (by default), the receiving thread will wait for a new FIX message in the blocking wait mode. If the value greater than zero, the receiving thread will wait for a new FIX message in the spin loop mode before switching to the blocking wait mode. The property specifies the spin loop period in microseconds. ReceiveSpinningTimeoutUsec property using makes sense when your session receives FIX messages frequently, in this case, waiting in the loop is cheaper than the thread context switch to the blocking wait mode. However please note that the spin wait increases the CPU usage so the spin wait period should not be too long.

Using send spinning timeout

SendSpinningTimeoutUsec property can be used to decrease the latency of the data sending. If the value is zero (by default) and the outgoing message cannot be sent immediately it is placed to the outgoing queue. If the value greater than zero, the Send(Message) method waits for the socket sending buffer availability in the spin loop mode before placing the message to the outgoing queue (to be sent later by the sending thread). The property specifies the spin loop period in microseconds. SendSpinningTimeoutUsec property using makes sense when your session sends FIX messages frequently, in this case, waiting in the loop is cheaper than the thread context switch. However please note that the spin wait increases the CPU usage and blocks the thread from which OnixS::FIX::Session::send method is called, so the spin wait period should not be too long.

Using session warmup

WarmUp(FlatMessage) method can be used to warm up the sending path. It makes sense if your session sends FIX messages infrequently, in this case, the sending path and associated data structures will not be in a cache and this can increase the send latency. You can periodically (a recommended period is 500 usec and less) call the WarmUp(FlatMessage) to avoid cache misses and keep the sending path fast. The FlatMessage object is required to warm up the sending path including the FIX message assembling and it will not be actually sent.

Reusing FIX message instances and event arguments in event handlers

Object creation is an expensive operation in .NET, with impact on both performance and memory consumption. The cost varies depending on the amount of initialization that needs to be performed when the object is to be created. OnixS .NET Framework FIX Engine exposes an ability to reuse message instances and event arguments in event handlers by the Session. We highly recommend to turn on ReuseIncomingMessage, ReuseOutgoingMessage, ReuseEventArguments to minimize the excess object creation and garbage collection overhead.

Note
If ReuseIncomingMessage turns on, the client's code should not dispose of incoming messages and must copy a message for using outside of inbound callbacks. If ReuseOutgoingMessage turns on, the client's code should not dispose of outgoing messages and must copy a message for using outside of outbound callbacks. If ReuseEventArguments turns on, the client's code must copy event arguments for using outside of callbacks.

Note

If ReuseIncomingMessage turns on, the client's code should not dispose of incoming messages and must copy a message for using outside of inbound callbacks.

If ReuseOutgoingMessage turns on, the client's code should not dispose of outgoing messages and must copy a message for using outside of outbound callbacks.

If ReuseEventArguments turns on, the client's code must copy event arguments for using outside of callbacks.

Sending FIX messages in a batch

Send(MessageBatch) and Send(FlatMessageBatch) methods can be used to send messages in a batch, it can decrease the send latency because all messages in the batch will be sent in a one TCP packet. This ability is available for Message and FlatMessage classes. Also you can decrease the send latency a little more by using SendAsIs(FlatMessageBatch)/SendAsIs(FlatMessage) methods. These methods send serialized message(s) without any fields updating, you can prepare message(s) before sending by setting all necessary fields and using PreFill(FlatMessageBatch)/PreFill(FlatMessage) methods. Please note that the MessageGrouping property contradicts the usage of the session send batch and can affect it. For the batch sending, the session uses the corresponding socket send function, which accepts an array of buffers to send all of them at once. When one sets the MessageGrouping property, the session tries to combine (copy) outgoing messages (when available) to the one outgoing buffer until the given grouping value is achieved and then uses the regular socket send function for sending. Therefore, one should not use the MessageGrouping property and batch sending simultaneously.

Consider to switch off the logging before sending

By default, the logging of an outgoing message to the session storage is performed before sending to the wire. This is more reliable because we guarantee that an outgoing message is stored before going to the counterparty and if the application is shut down after sending, for some reasons, the sent message can be resent afterward. However, this approach adds the logging latency to the FIX Engine sending latency. As a result, it increases the tick-to-trade latency. When the latency is more important, one can switch off the logging before sending, by setting the LogBeforeSending option to false. In this case, the logging of outgoing messages to the session storage will be performed after sending to the wire. This way, one can exclude the logging latency from the FIX Engine sending latency and as a result, decrease the tick-to-trade latency.

Updating Engine configuration

Disabling Resend Requests functionality

If no messages are re-sent on counterparty's Resend Request messages, then it is possible to set the Resending Queue size to zero, in order to increase overall performance.

Copy

EngineSettings settings;

settings.ResendingQueueSize = 0;

Disabling FIX messages validation

Validation significantly reduces performance due to miscellaneous checks that are performed on each FIX message. To disable FIX messages validation, the following settings should be used:

The Code sample below shows how to disable such options:

Copy

EngineSettings settings;

settings.ValidateEmptyFieldValues = false;
settings.ValidateFieldValues = false;
settings.ValidateRequiredFields = false;
settings.ValidateUnknownFields = false;
settings.ValidateUnknownMessages = false;

Do not use memory pressure

UseMemoryPressure option can be set to false. This option used to inform the runtime about allocated unmanaged memory. If this value was false, latency will be improved because of less GC work, but in case of a lot amount of large messages can cause OutOfMemoryException. It is safe to set this option to false if Dispose() called for each message. However, please note that Dispose() should not be called for incoming/outgoing messages inside event handlers when ReuseIncomingMessage/ReuseOutgoingMessage option is true.

Optimizing FIX dictionaries

When FIX Message is constructed, the space for all the fields, that are defined for the message, is allocated. The latest FIX specifications define a lot of fields for each message type. Even when most of the fields are not used, FIX Engine reserves the space for them, and this has a negative effect on FIX Engine's performance.

Editing dictionaries descriptions and excluding messages and fields (especially repeating groups), which are not used by the application, have direct influence onto FIX messages processing speed and thus decrease general latency of FIX message related processing.

Manipulating FIX messages

Constructing FIX messages, each time message has to be sent, is not efficient because it involves memory allocation and subsequent destruction. Message class provides the ability to reuse message of a particular type. It exposes the Reset() method, which wipes out all the FIX fields that are assigned to the given instance. This method does not deallocate all memory occupied, it just brings the message to the initial state, as it was just constructed. However, the previously allocated memory is reused, and thus second use of the object allows to setup FIX fields faster.

Reusing FIX message instance

A common strategy is to use the same message instance for sending multiple times. As soon as a message of a certain type is constructed, it is updated with common fields for subsequent re-use. Afterwards, before sending it to the counterparty, it is updated with mutable fields (like price or quantity) exactly for the case and sent to the counterparty.

Mutable fields are updated each time message is about to be sent to a counterparty.

Using preserialized FIX messages

As an advance for reusing FIX messages, the concept of pre-FlatMessage has been developed. In general, it follows the same approach of updating only mutable fields, before sending a message to the counterparty.

The only difference is that the message is serialized into 'tag=value' presentation in advance and, the serialization is not required when the message is actually sent.

Using GC-free interface

When you manipulate string field values of a message object by the general Get/Set interface, temporary string objects are created each time. It increases the GC pressure and, as a result, it can increase the latency of the code execution. Therefore, to minimize the GC pressure, one can use the following GC-free interface:

In order to compare field values, without a GC pressure, one can use CompareFieldValue(Int32, String) method. For example, the following code:

Copy

if(message.CompareFieldValue(Tag.ClOrdID, "ClOrdIdValue"))

does not allocate a managed memory at run-time, however, the following one:

Copy

if(message.Get(Tag.ClOrdID) == "ClOrdIdValue"))

does, because it creates a temporary string object.

In order to compare the message type value, without a GC pressure, one can use CompareType(String) method. For example, the following code:

Copy

if(message.CompareType("AE"))

does not allocate a managed memory at run-time, however, the following one:

Copy

if(message.Type ==  "AE"))

does, because it creates a temporary string object.

Note
All string literals, in C# code, are converted to string objects implicitly. Such string objects are stored in the internal pool and they are created only once when a string literal appears the first time in the code. Therefore, you need to create it in advance to avoid a string allocation on the critical path.

In order to get/set field values, without a GC pressure, one can use Get(Int32, StringBuilder)/Set(Int32, StringBuilder) methods. For example, the following code:

Copy

// stringValue object of StringBuilder type is created previously in some place.
msg1.Get(Tags.Text, stringValue);
msg2.Set(Tags.Text, stringValue);

does not allocate a managed memory at run-time, however, the following one:

Copy

string stringValue = msg1.Get(Tags.Text);
msg2.Set(Tags.Text, stringValue);

does, because it creates a temporary string object.

Note
The StringBuilder object should be created in advance, on the non-critical path. Also, it is important to create it with the required and sufficient capacity of the internal buffer to avoid further memory reallocation on the critical path.

Other Resources

High Throughput Best Practices