Preface

The default benchmarking procedure descrived in the Benchmarking Results topic measures application-level latency and does not take into consideration time the incoming data spends in the network/kernel layer. The given approach lets analyzing critical path inside the processing machinery (Handler). However, an application may not receive incoming data as soon as it arrives. Thus, measured latency does not reflect the other important aspects like time data spends in the network adapter or a kernel.

In order to improve the analysis, the support of hardware timestamps has been added to the SDK. Hardware timestamps are assigned to the incoming packets as soon as they arrive at the network adapter. If hardware timestamps are taken as the initial point of the latency measurement, that lets to obtain a timespan starting from the moment the data arrived to the network card and to the moment the results of data processing are delivered to the user.

The following tables depict the results of benchmarking using the regular (application-only) approach and compares them with the results obtained with the help of hardware timestamps. Also, the measurements were taken for the two major implementations of the Feed Engine machinery exposed by the SDK and encapsulated into OnixS::CME::ConflatedUDP::SocketFeedEngine and OnixS::CME::ConflatedUDP::SolarflareFeedEngine classes which use ordinary sockets and the Solarflare ef_vi SDK correspondently. Additionally, a socket-based feed engine was benchmarked in the OpenOnload environment to depict the actual benefits of using the OpenOnload for ordinary (socket-based) solutions.

Application-Level Only With 5 Milliseconds Delay Between Packets

Statistics	Latency (μs)
Statistics	SocketFeedEngine	SolarflareFeedEngine
Minimal	0.993	0.856
Median	1.144	1.017
Mean	1.203	1.071
95%	1.493	1.383
99%	2.047	1.830
Maximal	14.344	14.680

With Network/Kernel-Layer And 5 Milliseconds Delay Between Packets

Statistics	Latency (μs)
Statistics	SocketFeedEngine	SolarflareFeedEngine
Minimal	14.007	2.190
Median	52.087	2.664
Mean	48.853	2.709
95%	55.894	3.152
99%	60.966	3.798
Maximal	79.933	16.385

Application-Level Only Without Any Delay Between Packets

Statiscits	Latency (μs)
Statiscits	SocketFeedEngine	SocketFeedEngine+Openonload	SolarflareFeedEngine
Minimal	0.507	0.533	0.600
Median	0.667	0.693	0.714
Mean	0.739	0.765	0.778
95%	1.156	1.113	1.067
99%	1.564	1.560	1.490
Maximal	14.019	14.617	14.756

With Network/Kernel-Layer And Without Any Delay Between Packets

Statiscits	Latency (μs)
Statiscits	SocketFeedEngine	SocketFeedEngine+Openonload	SolarflareFeedEngine
Minimal	10.786	3.235	1.925
Median	12.610	4.148	2.353
Mean	12.676	4.267	2.377
95%	14.238	5.260	2.853
99%	15.562	9.407	3.310
Maximal	64.169	22.261	16.210

Conclusions and Important Notes

Using hardware timestamps brings kernel/hardware latency into overall measurements and thus allows performing analysis more precisely. For example, as depicted by the tables above, application-only latency is similar to the major implementations of the Feed Engine exposed by the SDK. However, using hardware timestamps shows the real benefits of using kernel-bypass solutions like the OpenOnload is. Moreover, it depicts in numbers the advantage of using specialized solutions like Solarflare ef_vi SDK while working with multicast data.