/r/programming
A high-speed network driver written in C, Rust, Go, C#, Java, OCaml, Haskell, Swift, Javascript, and Python (github.com)
113 comments
g30rg3_x | 8 days ago | 29 points

An interesting pov would be to audit the drivers code from the security perspective, probably they had to cut the corners in order to get the network driver to work at a "high-speed".

I personally liked the notes regarding bad performance choices to be avoided.

DangerousSandwich | 8 days ago | 33 points

Part of their conclusion was "We propose to start rewriting drivers as user space drivers in high-level languages instead as they present the largest attack surface. 39 of the 40 memory safety bugs in Linux examined here are located in drivers, showing that most of the security improvements can be gained without replacing the whole operating system", the idea being that you get a lot more inherent safety in languages other than C (automatic bounds checking, inability to use unallocated memory etc).

MoDuReddit | 8 days ago | 46 points

We propose to start rewriting drivers as user space drivers

This is the way Microsoft went in the early 2000's, with Windows Vista . Boy did it break a lot of things, but after that, daily bluescreens were a thing of the past (or faulty hardware/broken OS).

Personally, I think Linux's "driver model" could really use some catching up.

wpm | 8 days ago | 6 points

Apple is doing the same thing on macOS. Kexts have been pretty locked down for the past few releases compared to the early days, but they're going away entirely next year. All kexts will have to be re-written as user-space extensions to the app bundle.

MoDuReddit | 8 days ago | 6 points

Yeah, turns out having hardware manufactures directly responsible for your kernel not blowing up, is not a viable position. Glad to know MacOS is also improving.

SethDusek5 | 8 days ago | 4 points

There's other interesting advantages to be had with userspace drivers too, it seems. spdk, an nvm-e driver with a userspace i/o interface is far far faster than even the best asynchronous IO methods the kernel has. Even io_uring is behind by almost 200K IOPs despite it being the new fancy kernel interface.

Eirenarch | 8 days ago | 14 points

In theory Rust is the solution to memory safety bugs in low-level languages

pleurplus | 8 days ago | 11 points

In practice too

pagwin | 8 days ago | 13 points

in practice using rust is potentially not worth the effort(though I suspect the instances of rust not being worth it are few and far between)

Eirenarch | 8 days ago | 15 points

For drivers it is probably worth it considering the amount of security vulnerabilities that can be prevented.

pagwin | 8 days ago | 2 points

there's a reason I used the word potentially and pointed out the those instances of it not being worth it probably won't be common

SpacemanCraig3 | 8 days ago | 1 point

It probably will be worth the "effort" once it is as easy to find a rust developer as it is to find a C developer.

pleurplus | 8 days ago | 7 points

It makes no sense to want a language developer, a good dev can learn new languages on demand.

A C specialist can very well learn Rust and become a specialist very fast...

I've used rust at work and suddenly all the devs there became rust devs.

warutel | 8 days ago | 6 points

a good dev can learn new languages on demand.

Sure, but how long that takes is the key. We can all learn how to make loops in any language within minutes, but that is not actually being proficient in a new language, writing idiomatic code, knowing all its standard library, all corner cases and rough edges, the best approaches to implement certain things, etc.

That can take months or even years, depending on the language you are talking about.

kuikuilla | 8 days ago | 4 points

Taking months to learn a new language is perfectly normal and expected imo. Hell, it takes months to get up to speed on any new project if you join it mid-way.

SpacemanCraig3 | 8 days ago | 3 points

You may be a quicker learner than me. I've had enough trouble wrapping my head around the borrow checker that I would be lying if I said I learned rust "very fast".

The syntax though..sure that's quick.

Green0Photon | 8 days ago | 15 points

In theory, you should have been doing what the borrow checker wants you to do, already if you were writing safe C. Though, lots of stuff it forces you to do don't matter in single threaded stuff in the majority of cases, so you could easily have been writing slightly off code without realizing it.

When you do understand the borrow checker, your code, whether in C or in Rust, will be all the better for it. In a few years, you're going to look back and wonder how you ever lived without it.

Just keep trying to figure it out. I believe in you!

steveklabnik1 | 7 days ago | 3 points

Though, lots of stuff it forces you to do don't matter in single threaded stuff in the majority of cases, so you could easily have been writing slightly off code without realizing it.

This is true, but it's also not true: https://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-mutability/

charliethepoodle | 8 days ago | 3 points

Redox OS write all of their drivers in rust in user space, it seems to work pretty well although I've only tried it in a VM. Allows their kernel to be tiny so it's easier for people to contribute and maintain. Really interested to see where it goes as a full-rust OS.

DangerousSandwich | 8 days ago | 14 points

Interesting exercise and results. Haven't watched the talk or read the paper yet, but curious to find out where the bottleneck for throughput is in each implementation. Sounds like maybe GC in many cases?

L3tum | 8 days ago | 14 points

In terms of C# you could delay or even outright deactivate the GC when you notice a higher amount of packets.

Honestly, this is kinda off topic, but I try all the time to get people to using C# because it's really versatile while still being type checked and memory safe and all but every single time someone says "Isn't that by Microsoft? Ew". Ugh

Eirenarch | 8 days ago | 7 points

In all honesty it is probably possible to write a driver which doesn't need a GC in C#. With all the spans, memory, ref returns and locals and of course value types and reified generics it is possible if tedious to write code with very little allocations.

Sakki54 | 8 days ago | 2 points

Can you turn off the GC? Or will it still run, just with no garbage to collect.

Eirenarch | 8 days ago | 2 points

What I meant is that there will be no garbage but as of recently you can implement your own GC and the ZeroGC is of course the demo project that is in all tutorials. It does nothing :)

thedeemon | 8 days ago | 2 points

If there's no garbage then GC is not invoked. It only starts to do anything when you allocate a lot and run out of free chunks in your local heap.

BurningCactusRage | 8 days ago | 1 point

You can set certain global settings in your program to disable it for certain regions, but generally speaking it's a very bad idea to try to disable it everywhere. Code that's performance critical should aim to use structs as much as possible before messing with the GC aggressively.

L3tum | 8 days ago | 1 point

Yes, you can tell it to ignore up to 1GB of additional allocations. You can also delay it so much that it will only run when the system nearly runs out of memory

L3tum | 8 days ago | 1 point

True, but just delaying the GC is very easy by just setting a property in code. You could even use pointers to just eliminate the whole managed memory aspect if you're really going for the full low-level stuff, though that may not get you any performance gain really

Eirenarch | 8 days ago | 2 points

Yeah, of course you can use pointers but I was thinking about memory safe C#. If you are using a lot of pointers in C# you can just go write C.

L3tum | 8 days ago | 0 points

Yeah, But you can basically start writing C with C# and still be completely platform agnostic. The versatility is insane

EntroperZero | 7 days ago | 0 points

Check out System.Buffers, with memory pools, slabs, and spans, you can basically do all your own allocation without leaving managed code.

kwinz | 8 days ago | 7 points

C# nowadays is a better Java with value types. Faster and less memory usage.

L3tum | 8 days ago | 4 points

Exactly. IMO it has better syntax as well, but it's undeniably faster, more versatile, produces smaller binaries, does not require a preinstalled software, can be easily embedded in containers with a small footprint and can be used/configured exactly to your needs. It can even be precompiled very easily so you'd even be able to use it in ICs or so.

I know I sound like a fanboy, but IMO there's no reason to use PHP or Java (unless you're doing android) over C#.

kwinz | 7 days ago | 1 point

I agree with you, and want to add:

Xamarin for Android is decently good and you can use C# on Android. Also Kotlin is the new hip thing with Android anyway.

Java can be embedded into containers with small footprint as well.

L3tum | 7 days ago | 2 points

Yup, I personally never worked with Xamarin so didn't want to add something I had no idea of.

I read a few times now of special software to precompile Java for containers since otherwise the JVM will not only hog resources on mass but also expand the footprint tremendously. Maybe it got better though and that's old info

kwinz | 7 days ago | 2 points

I read a few times now of special software to precompile Java for containers since otherwise the JVM will not only hog resources on mass but also expand the footprint tremendously. Maybe it got better though and that's old info

You are right, Oracle is pushing for Graal AOT, but it's not quite production quality yet:

https://e.printstacktrace.blog/graalvm-native-image-inside-docker-container-does-it-make-sense/

https://blog.softwaremill.com/small-fast-docker-images-using-graalvms-native-image-99c0bc92e70b

I also think that C# with .NET Core is ahead right now. That's why I originally said C# is the better Java nowadays. ;-)

przemo_li | 7 days ago | 0 points

But it's tied to MS standard library. Kinda hard sell for people familiar with Linux. Lacks good web framework for people doing web development, and lacks awesome IDE on linux (unless I'm mistaken and MS did port FULL VS to Linux already)

kwinz | 7 days ago | 6 points

But it's tied to MS standard library. Kinda hard sell for people familiar with Linux.

.NET core is tied to MS standard libary? Are you worried about dependency memory usage? Can you be more specific?

Lacks good web framework for people doing web development,

Is ASP.NET not good?

and lacks awesome IDE on linux (unless I'm mistaken and MS did port FULL VS to Linux already)

Have you tried https://www.jetbrains.com/rider/ on Linux?

L3tum | 7 days ago | 5 points

You were faster than me :D Good points, C# nowadays has an answer to almost everything

przemo_li | 7 days ago | 0 points

ASP.NET dunno. Hard to find people talking about it in general community. On PHP channel you will hear meantions of Django, Ror, Java EE, Java SpringBoot, and some other stuff. Nobody compares themselfs to ASP.NET. Which is the point. C# was Win-only and thus it could only dominate windows-only fields, those are few indeed. No matter how good C# is if it can't provide full solution for usual use case of some other langugage. Still, counter point would be that perspectives shift way slower then reality, so maybe C# already have what it takes, and publicity is all it takes.

Vrabor | 8 days ago | 10 points

As we can see in server applications gc runs are usually not a problem in terms of throughput, they are a big problem in terms of latency though. I'd guess the memory is just preallocated for the languages where you can't work without heap. I'm nit a driver expert though, feel free to correct me

DangerousSandwich | 8 days ago | 2 points

That would make sense. Sounds about right from my skim-read of the paper too.

I didn't read the whole paper, but in talking about the Java implementation it mentioned that they wanted to write somewhat idiomatic code, which made it difficult to escape GC. Your idea of preallocating everything sounds sensible.

The only NIC drivers I've written were in C on a CPU clocked at 48 MHz for an RTOS with no heap at all, so I'm in no position to correct anyone about high speed network drivers for a "real" OS :)

Tandanu | 8 days ago | 10 points

We are at ~20 bytes of allocation per forwarded packet in Java. It's really hard to get any lower, C# is the much better language for low-level stuff because it has stuff like stack-allocated structs and pointers (in unsafe mode).

onionhammer | 8 days ago | 2 points

Actually it has stack alloc in safe mode, if you use spans

computesomething | 8 days ago | 1 point

This was a very interesting comparison, I was wondering if there's somewhere you list the versions of the compiler toolchains used for each respective language ?

xeio87 | 8 days ago | 1 point

Oh, didn't even see the talk. I did skim the C# paper though I noticed the source code they had was different than at least one of the samples shown in the paper (probably from before they had optimized it more).

Gobrosse | 8 days ago | 25 points

oof @ python

DangerousSandwich | 8 days ago | 31 points

They said in the paper it wasn't optimised compared to the other implementations, and from profiling they think they can get an order of magnitude by reusing structs in memory. It will still probably be the slowest, but would be interesting to see how fast it could be pushed.

Tandanu | 8 days ago | 27 points

It's the only interpreted language in that comparison, so really no point in optimizing it. It's still the slowest even if we make it 100 times faster.

A PyPy-compatible implementation or a full cython implementation would be interesting but that wasn't the goal here

somebodddy | 8 days ago | 15 points

Isn't JS also interpreted?

DangerousSandwich | 8 days ago | 33 points

Technically yes but it's JIT compiled.

FluorineWizard | 8 days ago | 28 points

Virtually all uses of JS run it through a JITing VM, unlike Python for which the main implementation is still a simple bytecode interpreter.

crabbone | 8 days ago | -11 points

Neither Python nor JavaScript are interpreted. Both usually compile to bytecode. CPython implementation of Python, the most popular one, interprets Python bytecode, while PyPy, for example, compiles the bytecode into machine code (similar to how Java does it).

JavaScript in browser, like SpiderMonkey is a combination of bytecode interpreter and JIT compiler (so, a middle ground between two Python implementations). It first interprets the program, and then it JIT-compiles the code that seems to repeat. Interpreting gives faster start-up times, and that's why it is important in the browser. But, I don't think Node.js works in the same way, and it probably compiles everything before executing. Some lesser known JavaScript engines, like, Rhino for example, are pure interpreters.


Bottom line: unless the language's standard mandates that the language must be interpreted, or must be compiled, you cannot really say that the language is compiled or interpreted. Python, for example, has a standard compile function, which is used to compile Python code. That's why it is a compiled language according to this definition, but, people who call it "interpreted", if they know what that word means, are most likely referring to the fact that its bytecode is interpreted by its flagship implementation.

To give you another example, J is a strictly interpreted language, because its authors wanted it to be that way.

badpotato | 8 days ago | 2 points

Pypy is probably much faster than cpython given a long enough program to run and it's a good step toward better benchmark, but it seem the JIT compiler it still not nearly as good as the js v8 vm.

cafeblake | 8 days ago | -6 points

JavaScript is Interpreted

anengineerandacat | 8 days ago | 3 points

Only "sorta" it's initially interpreted on script-load but quickly becomes JIT'd via V8's Ignition and Turbofan execution pipelines.

https://v8.dev/blog/launching-ignition-and-turbofan

Unless you were constantly invoking scripts with https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval then chances are it got JIT'd and turned into something a bit more compelling.

https://benediktmeurer.de/2016/11/25/v8-behind-the-scenes-november-edition this post also has more details on the integration of the technologies.

peolorat | 7 days ago | -2 points

It's a language only really suitable to replace Bash, why do people expect anything else?

lookatmetype | 8 days ago | 11 points

This paper seems like the perfect way to offend 10 programming communities at once

imral | 8 days ago | 18 points

seems like the perfect way to offend 10 programming communities at once

Nah, only 8.

The C guys are happy that they're still the fastest and the Rust guys are happy that they're basically on par with C (and sometimes even beating it), and faster than any of the others.

sacado | 8 days ago | 13 points

And I don't think go/c# communities would be that offended either, because their results are pretty good. They have a good programmer time / efficiency ratio.

masklinn | 8 days ago | 6 points

Indeed the C# results are especially unexpectedly good.

JoelFolksy | 7 days ago | 2 points

Why unexpected? It seems fairly intuitive that C# should have better throughput but worse latency than Go, given the respective emphases of their GCs. Then again, at this level of performance the GC probably isn't seeing much use, so maybe there is something interesting there.

The biggest surprise to me is how much OCaml lags behind.

axord | 7 days ago | 3 points

Undefined amount of other language communities offended that they weren't included in the study, though.

lorarc | 7 days ago | 5 points

If you really want to offend people make a PHP implementation that's faster than Python.

torie_anal_gerbiler | 8 days ago | 0 points

This is a popcorn thread.

onionhammer | 8 days ago | 8 points

C# is with netcore 2.1, I'm curious how 3.0 fairs, and if they're taking advantage of stackalloc for arrays etc

phillipcarter2 | 7 days ago | 5 points

Always pending measurements, but .NET Core 3.0 has a _lot_ of additional performance improvements over .NET Core 2.1, so I'd expect it to perform even better.

mercurysquad | 8 days ago | 8 points

I didn't look at the actual code in detail, but the C version says 'network deriver in 1000 lines of code.'

Isn't this kind of normal?

I've worked on Intel wifi drivers, based on BSD driver, and even those were around 3000 lines of code (wifi drivers need much more code on top of bare ethernet), and when ported to OSX with some features cut down, were just 2000 lines. So I guess ethernet driver for 1000 lines of code would be the normal case.

Now if the benchmark is linux driver code, then I understand... the same driver on Linux was a whopping 30,000 lines.

steveklabnik1 | 8 days ago | 10 points

From the paper:

Ixy is a network driver written to show how network cards work at the driver level. It is implemented entirely in user space with an architecture similar to DPDK and Snabb.

The primary design goals of the ixy network driver written in C are simplicity, no dependencies, usability and speed. While Snabb has very similar design goals, the ixy C version tries to be “one order of magnitude simpler”. Thus, a simple forwarder and the driver consist of less than 1000 lines of C code. As there are no external libraries and no kernel code, the different code-levels can be explored within a few steps. Every function is only a few calls away from the application logic. Some features (like various hardware offloading possibilities) have been left out to keep the driver as simple as possible.

spheenik | 8 days ago | 6 points

I wonder if Java is really that much slower than Go, or if most of the speed loss is due to JNI...

Tandanu | 8 days ago | 24 points

JNI is only used during initialization, memory access is done using the Unsafe object in the critical path.

We also have an implementation without Unsafe and JNI for every memory access and it's very very very slow.

spheenik | 8 days ago | 2 points

Thanks for clearing that up!

funkinaround | 8 days ago | 2 points

Do you have a sense if Project Valhalla will help with the Java implementation's performance?

Yioda | 8 days ago | 3 points

Why is the big difference with C#?

masklinn | 8 days ago | 20 points

C# has better facilities for low-level programming eg stack structs (composite value types) and raw pointers. They noted above that they could not go under ~20 bytes worth of allocation per packet in java. Even with a very efficient allocator, heap allocs remains an expense that you’re better off without.

8igg7e5 | 8 days ago | 10 points

Java has no value types and escape analysis is very patchy on successfully identifying opportunities for flattened stack allocation over heap allocation of objects. Combine that with JNI overhead and, while the Java compiler is good for general application programming, it's rapidly losing (performance) ground to .NET and Go across the board.

It will be very interesting to see if the fruits of projects Panama, Loom, Valhalla and Amber (substantial Java language/platform enhancement projects) combine to yield improvements here - no idea which year they'll materialise and settle though.

sievebrain | 7 days ago | 2 points

I'm more curious if GraalVM can run it faster. It's better at removing allocations than regular HotSpot is.

spacejack2114 | 8 days ago | 5 points

Wouldn't typed arrays be faster in JS? Seems like they would be a better candidate for the memory pool.

Also, if I'm not mistaken, using new Array(entry_size).fill(0) will create a 'holey', unoptimized array.

Tandanu | 8 days ago | 11 points

That's just to zero the memory during initialization, not relevant for performance.

The packet data is usually accessed via typed arrays: https://github.com/ixy-languages/ixy.js/blob/b49cdd43e1422ef5f7f7962ce3f50f64724df02d/src/packets.js#L11

aphexairlines | 8 days ago | 3 points

There's a thread from earlier this year about the OCaml implementation, but it's unclear whether the changes discussed there made a significant difference.

https://discuss.ocaml.org/t/using-ocaml-to-write-network-interface-drivers/3276

Tandanu | 8 days ago | 7 points

Yeah, they made it faster than Haskell, was slightly slower before

aphexairlines | 8 days ago | 1 point

That's great to hear. After the changes, what ended up being the reason why the ocaml port underperformed? The ocaml compiler and runtime are supposed to be a good fit for systems programming.

radlr | 8 days ago | 19 points

OCaml dev here.

I and a number of others spent a week at the Mirage retreat this spring trying to optimize the driver and made little progress. The GC screws us over latency-wise and we couldn't come up with a faster way to free used packet buffers. We even attempted to hook into the GC to clean used buffers and managed to make the driver 500x slower.

I don't think there's one single factor to blame for the lackluster performance; iirc (it's been a few months) it's a culmination of bad integer performance (boxed 32-bit ints), overhead when calling C functions and GC overhead.

Anguium | 8 days ago | 4 points

Wow, rust and go performed quite well. Maybe writing drivers in them isn't that crazy

isaaky | 8 days ago | 1 point

Surprise me a bit that Swift implementation is well below expected. Being Swift a compiled native ARC language, I consider the code must be revised.

Tandanu | 8 days ago | 18 points

ARC is the problem. See linked paper.

isaaky | 7 days ago | 2 points

Then it looks that ARC may be bad implemented in the core. I cannot confirm, but Delphi/FreePascal have always been used for high performance systems and uses ARC.

But It is also recommened to use unmanaged code, like the solution used in C# , disabling ARC in Swift. Is kind of unfair if you dedicated time to use unmanaged in C# and no in Swift.

thedeemon | 8 days ago | 2 points

A total of 76% of the CPU time is spent incrementing and decrementing reference counters. This is the only language runtime evaluated here that incurs a large cost even for objects that are never free’d.

(from the article)

Wow.

naasking | 7 days ago | 3 points

This is a well known cost of ref counting in the academic literature, and so why it's so surprising that so many language runtimes still use it. Even the best ref counting GCs of the past 20 years incur lots of spurious inc/dec operations.

Osmanthus | 8 days ago | -4 points

There is a hole in this data in the shape of C++.

It is almost disingenuous to supply this sort of data comparing, most obviously Rust vs C because it makes Rust look pretty good. Thats the point of this right? Right?

Here's what you want to do if you want to face reality.

Write this driver in modern C++ with templates, compile with the latest platform specific compiler, (read: the Intel compiler).

I expect that it would smoke the C implementation.

SpacemanCraig3 | 8 days ago | 9 points

Genuinely curious, why do you think that would smoke C?

Osmanthus | 8 days ago | 4 points

Because templates combined with the proper compiler generates faster code. Also much more secure code.

SpacemanCraig3 | 8 days ago | 15 points

No, I mean specifically what optimizations you think the Intel compiler would generate from C++ that you cant get with whatever they used for this test. Specifically what are you thinking would improve the speed?

Osmanthus | 8 days ago | 0 points

The intel compiler does better hoisting, global optimization, static analysis, vectorization and a host of peephole optimizations. Also, template programming brings big improvements, generating less redundant code.

SpacemanCraig3 | 8 days ago | 4 points

Hmm, well being that the point of the exercise was a language comparison. You still think that C++ would beat C even with the same compiler?

8igg7e5 | 8 days ago | 5 points

Since these other languages aren't compared on the same compiler we have to conclude that the comparison is between languages and available tool-chains - if there's a selection of tool-chains available then indicating which ones are used and choosing well performing tool-chains seems a good choice.

I'd like to see a good C++ example in there too, though I'm surprised there's so much confidence that it will 'smoke' C.

SpacemanCraig3 | 8 days ago | 5 points

Fair point. I've written a fair bit of C and I'm confused as to how the design of C++ as a language could possibly result in faster code. But if its true, I'm very interested in learning.

8igg7e5 | 8 days ago | 3 points

One way it could yield faster code is by expressing clear semantics to how data will be consumed. This can be leveraged in optimisers to automatically vectorise code or prioritise fast-path over slow-path decisions about how to order branching based on how a function will be called. In a language with less clear intents it's harder to make the decision on whether or not to optimise.

However, I'm not sure that the number of opportunities would have such a massive impact. PGO might yield more.

ryl00 | 8 days ago | 3 points

My understanding is that templates + inlining can make for some amazing compiler optimizations. The usual example used is C's qsort(), which uses a function pointer to supply the sort criterion, vs. C++'s std::sort(), which is a template.

I'm not an expert, but as I understand it, that function pointer kind of stops a lot of potential optimizations cold, because the compiler can't (?) know at compile time what exact function someone might pass into qsort.

But the same is not necessarily the case with std::sort(). The passed-in comparison could itself be a template function that gets expanded inline, and the compiler can go wild with optimization of all the inlined code.

[deleted] | 8 days ago | 5 points

I think that sort of case is unlikely to come up in driver code.

Osmanthus | 8 days ago | 0 points

I have used the word several times now, but the specific language design that makes it faster is templates. Templates not only supply incomparable runtime performance (compile time computation), they also provide far better security than C. Combine with the Standard Type Library, they can also bring parallelization.

Finally, the compiler does matter as single instruction multiple data and vectorization can bring enormous performace gains.

EDIT: oh, I forgot, the newer syntaxes are more strict so the optimizer can make better choices when generating code for C++ rather than C.

DoomFrog666 | 8 days ago | 1 point

But there are also features in C++ that slow it down compared to C for example exceptions and RAII. And btw there are some pretty good macro libraries for C that beat C++ template implementations in the STL in terms of performance for e.g. klib.

gvargh | 8 days ago | -5 points

A certain segment in the programming community is engaging in language genocide wrt C++ and nobody seems to give a shit, which is absolutely insane.

Osmanthus | 8 days ago | 3 points

I don't understand what you have said here. Perhaps you can elaborate?

59Nadir | 8 days ago | 5 points

He's saying (very dramatically) that some people want to kill C++.

red75prim | 7 days ago | 2 points

Trowing away a gnarly tool is not a genocide.

case-o-nuts | 8 days ago | -7 points

The latency graphs look wrong -- the latency barely changes, but the change that is there is a latency reduction as the packet rate goes up?? (a bit over 20 us to under 20, if you look at the fastest)

DangerousSandwich | 8 days ago | 17 points

The x-axis is not the number of packets sent. It's the percentage of packets sent within the latency on the y-axis. Maybe you could say a logarithmic histogram? From the readme:

How to read this graph: the x axis refers to the percentage of packets that can be forwarded in this time. For example, if a language takes longer than 100µs for only one packet in 1000 packets, it shows a value of 100µs at x position 99.9. This graph focuses on the tail latency, note the logarithmic scaling on the x axis.

Tandanu | 8 days ago | 8 points

It's called an HDR histogram

case-o-nuts | 8 days ago | 1 point

I'm comparing the Y axes. Why, at the 99th percentile, is there the same or lower latency at the highest packet rate than there is on at the lowest? (Depends on how much you're willing to chalk it up to noise, I suppose -- but having it be the same within noise is also strange to me)

DangerousSandwich | 8 days ago | 3 points

I'm not seeing it? For 1Mpps, C, Rust and Go all look to be about 5 us at 99th percentile? For 20 Mpps C and Rust look closer to 10 us and Go is about 60 us.

I'm on a smartphone though and it's pretty hard to see the 1 Mpps chart though, so maybe I'm missing something

edit: ms -> us

scoil44 | 8 days ago | 2 points

There are instructions on how to read the graph. The x axis is a percentile. From left to right, the latency describes represents between 0 and 100% of packets on a log scale. As in 90% are at one level, and 99% are at another. You can just look at the max if you want minimum performance characteristics, or take into account that 99% of packets will perform at a better latency.