Version 1.0.0
Garbage Collection (GC) is a form of automatic memory management, which brings many benefits to the .NET platform.
But GC also has disadvantages:
So, for performance-critical applications, it is preferable to avoid garbage collection. The most efficient way to prevent GC is to design the application not to allocate memory at the critical path.
The .NET platform offers several ways to monitor and measure the GC load and memory allocation. One of them is to use the built-in .NET API, and this article describes its usage to test the GC load of OnixS ultra-low latency .NET Core FIX Engine.
Garbage Collection could significantly affect the tick-to-trade latency of trading applications. For example, on high-load channels, CME generates a large amount of market data messages. When GC blocks working threads, the latency could increase from microseconds up to seconds (1 million times) for a few hundred sequential multicast packets.
Before the GC optimization, the .NET edition of the OnixS CME MDP 3.0 Market Data Handler allocated memory during the processing of each packet. This allocation causes 162 GC collection count for Generation 0 and 2 GC collection count for Generation 1 while processing 50,000 packets. The median latency was 12.76 microseconds, and the 99.9th percentile latency was 453.65 microseconds. After optimization, the Handler does not allocate memory at the critical path; all GC collections counts are zero. The median latency was reduced to 4.37 microseconds, and 99.9th percentile latency was reduced to 22.97 microseconds. This optimization shows us the importance of GC optimization in ultra-low latency trading applications. The developers of such applications need to pay the same attention to memory performance as they pay to CPU performance.
.NET class library has a built-in API that allows measuring GC load and memory allocation. The System.GC class has a lot of methods to control GC and collect statistics.
To collect statistics about memory allocation and GC collection, we can use the following methods:
For example, we can use GC. GetAllocatedBytesForCurrentThread() to measure the difference in memory allocation between time intervals or events in the following way:
long start = GC.GetAllocatedBytesForCurrentThread();
var x = new byte[10000];
long difference = GC.GetAllocatedBytesForCurrentThread() - start;
Console.WriteLine(".NET allocated {0} bytes during the array construction.", difference);
The console output:
.NET allocated 10024 bytes during the array construction.
We use System.GC methods to measure the load testing of sending and receiving messages using OnixS .NET Core FIX Engine.
The test project creates two FIX sessions in the same process and sends one million messages.
The test consists of three phases:
During this phase, acceptor and initiator FIX sessions are created, the connection is established, and several orders are sent to warm up the code.
In this phase, the initiator sends one million New Order Single FIX messages (150 bytes each):
int NumberOfOrders = 1000000;
long initiatorReceiverAllocatedStart = initiatorSenderAllocatedStart =
initiatorReceiverAllocatedEnd = initiatorSenderAllocatedEnd = 0;
long startBytes = GC.GetAllocatedBytesForCurrentThread();
int startGC0 = GC.CollectionCount(0);
int startGC1 = GC.CollectionCount(1);
int startGC2 = GC.CollectionCount(2);
for (int i = 0; i < NumberOfOrders; ++i)
{
initiator.Send(order);
receivedReply.WaitOne(Timeout);
}
long allocatedBytes = GC.GetAllocatedBytesForCurrentThread() - startBytes;
int gc0 = GC.CollectionCount(0) - startGC0;
int gc1 = GC.CollectionCount(1) - startGC1;
int gc2 = GC.CollectionCount(2) - startGC2;
long initiatorSenderAllocatedBytes = initiatorSenderAllocatedEnd - initiatorSenderAllocatedStart;
long initiatorReceiverAllocatedBytes = initiatorReceiverAllocatedEnd - initiatorReceiverAllocatedStart;
The acceptor receives one million of orders and sends them back:
acceptor.InboundApplicationMessage += (object sender, InboundMessageEventArgs e) =>
{
((Session)sender).Send(e.Message);
};
The initiator receives one million orders and calculates statistics for sending and receiving threads.
initiator.InboundApplicationMessage += (object sender, InboundMessageEventArgs e) =>
{
if (initiatorReceiverAllocatedStart == 0)
{
initiatorReceiverAllocatedStart = GC.GetAllocatedBytesForCurrentThread();
}
initiatorReceiverAllocatedEnd = GC.GetAllocatedBytesForCurrentThread();
receivedReply.Set();
};
initiator.OutboundApplicationMessage += (object sender, MessageEventArgs e) =>
{
if (initiatorSenderAllocatedStart == 0)
{
initiatorSenderAllocatedStart = GC.GetAllocatedBytesForCurrentThread();
}
initiatorSenderAllocatedEnd = GC.GetAllocatedBytesForCurrentThread();
};
FIX Sessions are disconnected and the GC statistic is logged.
initiator.Logout();
acceptor.Logout();
initiator.Dispose();
acceptor.Dispose();
Console.WriteLine($"{StorageType}:");
Console.WriteLine($"Main: {result.main} bytes were allocated per {NumberOfOrders} {orderType} orders");
Console.WriteLine($"Initiator sender: {result.sender} bytes were allocated per {NumberOfOrders}
{orderType} orders");
Console.WriteLine($"Initiator receiver: {result.receiver} bytes were allocated per {NumberOfOrders}
{orderType} orders");
Console.WriteLine($"GC0: {result.gc0}, GC1: {result.gc1}, GC2: {result.gc2}");
Test project measures GC statistics before and after the send-receive loop and calculates the difference.
Results measured:
Session Storage | Main thread, allocated bytes | Sending thread, allocated bytes | Receiving thread, allocated bytes | Total allocated bytes | Allocated bytes per one message | GC0 | GC1 | GC2 |
---|---|---|---|---|---|---|---|---|
Memory based | 0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
File based | 0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Async file based | 0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
This load testing helps us verify that the OnixS ultra-low latency .NET Core FIX Engine can send and receive one million FIX messages without producing any GC load.
Automatic memory allocation significantly simplifies and speeds up development, but it can negatively affect the performance of latency-sensitive applications, so it is vital to measure the garbage collection load. The built-in .NET API makes it trivial to monitor Garbage Collection and do GC Load Testing.