OnixS C++ CME MDP Conflated UDP Handler  1.1.2
API documentation
Benchmarking with Network/Kernel Layer

Preface

The default benchmarking procedure descrived in the Benchmarking Results topic measures application-level latency and does not take into consideration time the incoming data spends in the network/kernel layer. The given approach lets analyzing critical path inside the processing machinery (Handler). However, an application may not receive incoming data as soon as it arrives. Thus, measured latency does not reflect the other important aspects like time data spends in the network adapter or a kernel.

In order to improve the analysis, the support of hardware timestamps has been added to the SDK. Hardware timestamps are assigned to the incoming packets as soon as they arrive at the network adapter. If hardware timestamps are taken as the initial point of the latency measurement, that lets to obtain a timespan starting from the moment the data arrived to the network card and to the moment the results of data processing are delivered to the user.

The following tables depict the results of benchmarking using the regular (application-only) approach and compares them with the results obtained with the help of hardware timestamps. Also, the measurements were taken for the two major implementations of the Feed Engine machinery exposed by the SDK and encapsulated into OnixS::CME::ConflatedUDP::SocketFeedEngine and OnixS::CME::ConflatedUDP::SolarflareFeedEngine classes which use ordinary sockets and the Solarflare ef_vi SDK correspondently. Additionally, a socket-based feed engine was benchmarked in the OpenOnload environment to depict the actual benefits of using the OpenOnload for ordinary (socket-based) solutions.

Application-Level Only With 5 Milliseconds Delay Between Packets

Statistics Latency (μs)
SocketFeedEngine SolarflareFeedEngine
Minimal 0.993 0.856
Median 1.144 1.017
Mean 1.203 1.071
95% 1.493 1.383
99% 2.047 1.830
Maximal 14.344 14.680

With Network/Kernel-Layer And 5 Milliseconds Delay Between Packets

Statistics Latency (μs)
SocketFeedEngine SolarflareFeedEngine
Minimal 14.007 2.190
Median 52.087 2.664
Mean 48.853 2.709
95% 55.894 3.152
99% 60.966 3.798
Maximal 79.933 16.385

Application-Level Only Without Any Delay Between Packets

Statiscits Latency (μs)
SocketFeedEngine SocketFeedEngine+Openonload SolarflareFeedEngine
Minimal 0.507 0.533 0.600
Median 0.667 0.693 0.714
Mean 0.739 0.765 0.778
95% 1.156 1.113 1.067
99% 1.564 1.560 1.490
Maximal 14.019 14.617 14.756

With Network/Kernel-Layer And Without Any Delay Between Packets

Statiscits Latency (μs)
SocketFeedEngine SocketFeedEngine+Openonload SolarflareFeedEngine
Minimal 10.786 3.235 1.925
Median 12.610 4.148 2.353
Mean 12.676 4.267 2.377
95% 14.238 5.260 2.853
99% 15.562 9.407 3.310
Maximal 64.169 22.261 16.210

Conclusions and Important Notes

Using hardware timestamps brings kernel/hardware latency into overall measurements and thus allows performing analysis more precisely. For example, as depicted by the tables above, application-only latency is similar to the major implementations of the Feed Engine exposed by the SDK. However, using hardware timestamps shows the real benefits of using kernel-bypass solutions like the OpenOnload is. Moreover, it depicts in numbers the advantage of using specialized solutions like Solarflare ef_vi SDK while working with multicast data.