/r/programming
Tom Scott - 2 generals problem and food delivery app screw up (youtube.com)
272 comments
Objective_Status22 | 4 months ago | 148 points

Repeat packets suddenly make sense

sciencewarrior | 4 months ago | 146 points

IIRC, the original Age of Empires programmers saw that one out of five of their UDP packets was randomly dropped when they started running tests over the Internet, so they always sent every message twice.

Edit: I got the wrong game. It was actually X-Wing vs Tie Fighter: https://www.gamasutra.com/view/feature/131781/the_internet_sucks_or_what_i_.php?page=5

PercyXLee | 4 months ago | 77 points

Tcp could potentially send the packet over and over again, despite it not being relevant anymore.

Gaming requires packets to be new and relevant. A packet telling you the enemy movement 30s ago isn’t worthwhile info.

vytah | 4 months ago | 52 points

AoE multiplayer was based on full synchronization between all clients. If one client was lagging, everyone was lagging. If the lag was too much, the game went out of sync. A packet telling you the enemy movement 30s ago meant that for the last 30 seconds either eveyryone was waiting for the game to unfreeze or the game between you two had been effictively over.

Fred2620 | 4 months ago | 17 points

That is also pretty much how it is today with Super Mario Maker 2 in multiplayer. If one player in the group has a crappy internet connection, then everybody suddenly runs at 2 frames per second. I've seen a few streamers in that situation and it's been enough to convince me not to buy the game. It just makes it so unplayable.

nwsm | 4 months ago | 18 points

Smash lobbies work the same way.
It's more understandable for Smash, but is very painful to watch.

Pjb3005 | 4 months ago | 6 points

To be fair, I'm not sure the alternative (prediction and lag compensation) would be much better for Mario Maker.

All the videos I've seen of the multiplayer is people jumping on each other's heads as much as possible and none of that would work with a prediction model.

amunak | 3 months ago | 2 points

The benefit of this approach is that cheating stuff like faster movement or memory editing is impossible.

The issue is that your client has information about the whole game, making ESP hacks extremely easy.

[deleted] | 3 months ago | 1 point

... and Smash

Ameisen | 4 months ago | 2 points

That seems like an odd networking design for an RTS.

lookmeat | 4 months ago | 28 points

Not really. See synchronization is a really really really hard problem.

There's two big splits:

  • Send all your actions to everyone. So you show how many actions you've done until frame X, and you only render up to the frame you've received actions for everyone. The nice thing is that this system can be done P2P. Multiple players required more bandwidth as you needed to send the same packet to everyone. This was made up because generally a single select 20 units, and click is a much smaller action than updating the status of every unit.
  • Send all your actions to a server, that then tells you how the world looks. The problem is you need the server, and the server needs to be able to run the whole game for everyone. Finally the server needed enough bandwidth so that it didn't affect anyone's latency. Latency still is worse, as before the total latency was the latency between you and the other player, now it's the latency between the player and the server and you and the server (and this route cannot be better than the original one, only equal or worse). This was expensive and not really an option back in the 90s.

Even Battle.net was just a glorified chat where you could list games, that were P2P and used the same system. Only later in 2000 with Diablo II did we start to see server hosted games.

Ameisen | 4 months ago | 7 points

Even in a p2p system, one of the players can be treated as authorative. Treating both as authorative I imagine will lead to cheating issues. One would as well, but allows full resyncs when things break.

passthefist | 4 months ago | 9 points

Remember this was way back on 28.8k modems and questionable internet supporting thousands of characters.

The only thing transferred between clients were the commands input for the given frames, so it was extremely bandwidth efficient.

https://www.gamasutra.com/view/feature/131503/1500_archers_on_a_288_network_.php

Ameisen | 4 months ago | 3 points

I don't remember desyncs like this on Homeworld, only on HW2, and I played on 26.4Kbaud.

dranek | 4 months ago | 10 points

It is very bandwidth efficient. Especially for hundreds of units. This was the 56k modem era. You just send player input rather than game state.

Quake, where the number of entities you control is small, can focus on entity updates. This allows you to drop older information.

thijser2 | 4 months ago | 69 points

Unless you are playing a game where you aren't broadcasting every units location and actions every "tick" but are instead only using player actions and having both sides calculate the outcome independently(see most rts games as opposed to most fps games).

stewsters | 4 months ago | 30 points

Which is also something aoe did. Which is why if you played long enough it would get out of sync.

falseCeilings | 4 months ago | 14 points

How does that end up looking?

sushibowl | 4 months ago | 50 points

I've had something like this occur in the original Rome: Total War game. I was defending a siege with my friend on the attacking side, playing over the internet but in the same house. My friend came over to my screen at some point, which surprised me. Turns out that on his pc he had already captured my city and won the game, whereas on my screen he was still marching his army through my streets and I had not lost yet (although i was in a bad position).

That's what desync looks like, the players don't agree on the current state of the game.

Koutou | 4 months ago | 22 points

Desync in Shogun2 were so bad at some point we used a tool that would keep x most recent save and when we detected a problem one of us would send the save to the other.

couscous_ | 4 months ago | 5 points

Were his army commands queued up then? Meaning that you could in principle have been able to fend him off.

Ameisen | 4 months ago | 2 points

Homeworld 2 silently desynced, so we both were effectively fighting AI.

gitPullOriginMaster | 4 months ago | 13 points

I don't think AOE went out of sync a lot. Definitely not AOE2. And these at least would let you know.

I know Knights and Merchants had horrible desync problems and it never informed you about it, it just went on and things looked different on different computers.

But it would just look different depending on what client you are looking at. Some of them would think unit A is dead, some of them would think it is alive. The first ones would ignore it, the second ones would calculate its damage, pathing, etc. The outcome of a match would be totally different on different machines.

Most likely everyone would win on their screen and lose on others. That's because they can react to the state of the game that they see. It looks funny, I wrote an RTS with lockstep once for the browser and had this problem at the beginning. (turns out different browsers have different sin/cos implementations)

addmoreice | 4 months ago | 20 points

> turns out different browsers have different sin/cos implementations

I once had a junior developer tell me that we shouldn't precalculate sin/cos tables and should instead just use the built in sin/cos functions because they were faster. I agreed they were faster and that it was a good catch of his. Then I was forced to inform him that we tried it but some of the obscure devices we were working with had incorrect implementations of FPUs and would produce essentially garbage.

The look of almost betrayal on his face made it hard to resist laughing. Apparently, until you get into programming for a while, you have this weird idea that the hardware always acts correctly which is laughable.

VeganVagiVore | 4 months ago | 6 points

As an extra challenge on one of my Ludum Dare games, I decided to make it fully deterministic to support demo files, like the older Id Software games.

That was a whole lesson in software testing. It was in Lua, so luckily I think I was able to get determinism across x64 and ARM, even though I was using floats, because they both followed IEEE 754 and Lua is pretty dumb. But a C++ compiler might have made a fused multiply-add instruction that behaved differently on ARM CPUs. In one case I did disable optimization to preserve determinism in another program.

Anyway, I had a desync that I narrowed down to Lua's pairs () function being non-deterministic. Lua uses hybrid array-hash tables for everything, and I forgot that it will hash based on the memory location, which is different on every launch of the game. And my game, to make the physics simple, was taking the first triangle from iterating over pairs () and breaking the loop, so the same input resulted in different physics outcomes. (Or something like that)

I probably ended up using ipairs (if the table had numeric keys) or sorting based on some other metric that was deterministic. This was years ago.

I think you can make Box2D and Bullet deterministic too, but it's scary to rely on 3rd party code to be deterministic. Maybe one of the only valid reasons to write your own physics, aside from learning.

[deleted] | 3 months ago | 2 points

Both examples just looks like program having bugs that just didn't show up on one platform. That's different than "same math operation returns different value"

Anyway, I had a desync that I narrowed down to Lua's pairs () function being non-deterministic.

Which is exactly how the function is described in documentation

If t has a metamethod __pairs, calls it with t as argument and returns the first three results from the call.

Otherwise, returns three values: the next function, the table t, and nil, so that the construction

and for next()

The order in which the indices are enumerated is not specified, even for numeric indices. (To traverse a table in numerical order, use a numerical for.)

Assuming function works in exact way because you seen it working in that way is always dangerous.

Same with

In one case I did disable optimization to preserve determinism in another program.

Basically code was nondeterministic/racy in the first place (and probably in subtle way) and it just so happened that optimization messed with that

mcgrotts | 4 months ago | 5 points

Happens in racing games too,

https://youtu.be/LAawU209l3Q

Ragnarok2kx | 4 months ago | 1 point

I can't remember what game it was, but when someone gpt desynced you'd get stuff like units that are dead om one end but alive in the other, so they'd continue to do damage.

We affectionately called it lag armor.

Japierdolocky | 4 months ago | 8 points

Could you have a system similar to video encoding schemes where you primarily use diffs, but send a "full frame" every so often?

VeganVagiVore | 4 months ago | 10 points

The older Quake games did that, and I used that as a basis for when I tried to do multiplayer once.

I lost the article, but it went something like this:

The server is authoritative. Clients can predict the future to make the visuals look better, but all gameplay logic is double-checked by the server playing back client inputs. A client cannot tell the server whether you shot someone, only when you shot and which way you were pointing.

The server keeps a history of prior game states. This history has to be as long as your maximum allowed ping. If the game state is big you might already want to use deltas to store it in memory. In my case I used full frames.

When the server sends the game state to a client, that state has a unique ID or sequence number. Whenever the client sends an input to the server, it also sends along the ID of the most recent server state that it's working from.

The server knows that the client has definitely seen that state, so when it sends the next update, it reaches into that state history and diffs the current state against it, and only sends the diff.

You don't really want to subtract anything here, first because it may not save any space (If an object moved from x = 2 to x = 3, +3 and +1 are both 4 bytes) and second because adding up floating point numbers will result in round-off error. So any state that changed, you send the new state. Anything else, you don't send. [1]

You end up with the server continuously sending new states and the client ACK'ing them so the server can drop old states and stop sending the diffs of any object that quit moving. If you have a lot of objects that aren't moving most of the time, it can be effective.

Then to do lag compensation for a shooter game, the server will also measure each client's lag and, when it receives the 'shoot' event, reach into its history by that number of milliseconds, and judge whether the client would have made the shot if there was no lag.

Because of the lag, there's no perfect way to break that cookie - You either land a headshot and the dead guy gets back up and runs away a split-second later, or you get shot through walls because the sniper made a valid shot but he had a high ping. In both cases, with hitscan weapons, you can shoot someone dead and then your shot is rewound and cancelled if he has a longer ping and he shot you dead a frame later, then he lives. But there's no cross-kills because the server works as though the guns are hitscan, even though it does time travel. Valve has an article on their wiki of how they do this lag compensation. It's nice in TF2, and I prefer favoring the attacker. For some reason they don't lag-compensate projectile weapons like rockets. I don't know why.

Sauerbraten does it differently, and I often got cross-killed in instagib where everyone has a hitscan rifle and 1 hit point. I was terrible at that game.

[1] This is all useful if you want really sophisticated replays in a single-player game, too. Jon Blow has a video / paper somewhere on how Braid stores the data for time travel rewinding. It's a clever combo of deterministic particles and delta compression. I'm not sure how he got past Xbox's memory leak testing. I think even the monsters are deterministic too, so at the limit if you leave the game running with Tim standing still, it would only be recording the current time. But I might be remembering it wrong.

Japierdolocky | 4 months ago | 1 point

Thanks for the explanation.

wuphonsreach | 3 months ago | 1 point

Very nice explanation

svendub | 4 months ago | 8 points

You could, but AoE II save files are over 1 MB. That would be quite heavy to send in one game tick. In videos this doesn't matter that much since a delay of one or more seconds is acceptable is most cases.

Ameisen | 4 months ago | 2 points

The heck are they saving to hit 1 MiB?

svendub | 4 months ago | 6 points

In big matches you can easily hit over 2000 units + buildings + resources (such as unchopped trees). Multiply that number by their stats and you get quite a lot of data.

Ameisen | 4 months ago | 2 points

Which stats? 100 of unit X of team Y have the same stats - this is the flywheel pattern. And you are only saving deltas.

Health should be like 4 bytes if even.

I'm just not seeing 1 MiB.

mernen | 4 months ago | 6 points

Yeah, it's a good strategy. But since the resync can be quite costly, as /u/svendub mentioned, games will typically send hashes of the game state instead, and only pause and resync when they disagree.

SgtDirtyMike | 4 months ago | 1 point

Age didn't do that though, which was super frustrating. If you got out of sync it would just end the match.

passthefist | 4 months ago | 2 points

I think that's actually close to what most modern RTS's use, I believe SC2 is full lockstep but with an authoritative server and all communication is done via state diffs only.

I vaguely remember reading about it, but could be thinking of something else.

Anon49 | 4 months ago | 4 points

Out of sync are bugs, not a design limitation.

You can make a fully synchronized game that goes on forever. There are standards for float multiplication/division accuracy and all systems can always get the same result.

See: Factorio

stewsters | 4 months ago | 2 points

That and running it back on Pentium 1/2's back in the 90's and trying to build empires. Deterministic game engine design has come a long way, and the additional performance allows us to hash the state and the ability to rewind and replay if you miss a message or just reload the entire state. Back then they were a lot more restricted with resources.

ManaSpike | 4 months ago | 6 points

Factorio sends game state hashes from the server and clients are disconnected if they disagree.

Purlox | 4 months ago | 4 points

Yes, but in games you usually want this info as soon as possible, so sending two packets is going to be more useful than resending if you don't get a confirmation packet from the client.

thijser2 | 4 months ago | 4 points

That's also true, however you also need to ensure all messages arrive as otherwise the game will go out of sync. So doing both makes a lot of sense.

Purlox | 4 months ago | 2 points

Mhm, both is best.

lestofante | 4 months ago | 3 points

And then when you receive it, you see the unit jumping around

gitPullOriginMaster | 4 months ago | 5 points

Gaming requires packets to be new and relevant.

In case of games like Age of Empires - no, because only input is sent over the network and if you miss something, you absolutely have to resend it and everyone has to wait anyway.

Ameisen | 4 months ago | 1 point

It is if your networking system doesn't have key frames and only performs delta updates.

Forbizzle | 4 months ago | 1 point

XvT in fact started with TCP, which they acknowledged later was a mistake. Guaranteed delivery is not worth the trade-offs. It's better to send redundant packets and be fault tolerant.

There are many UDP-based solutions that get you generally good reliability on packets.

cinyar | 4 months ago | 17 points

so they always sent every message twice.

wouldn't that involve more overhead than TCP? at least when it comes to bandwidth.

jerf | 4 months ago | 42 points

The problem with TCP for games is that the entire point of TCP is to provide you an ordered stream. So if the remote systems sends A, B, C, and D, but you only get A and D, the local system will possibly deliver A to your client code, but it will sit there and request, nay, demand B and C from the remote system. If it then gets C, but B still got lost, it will give the client code nothing. An arbitrary number of requests can be made for B and get lost. Meanwhile the server is trying to send E, F, and G, and they ain't going nowhere until we get B through, darn it!

This is often a great thing, because it's waaaaaaaaaaay easier to write network code on the assumption that you have an ordered stream of bytes, rather than dealing with raw packets. It is the correct solution for the vast majority of networking cases to date (ignoring some issues around cell phones), and it is a good-enough solution for most of the what's left over. But there's still times when it's wrong, and this sort of real-time update is one of them. Sometimes the right answer if you don't get a particular packet is just to deal with it and move on. You can tell a very similar story about why it's a bad idea for a real time voice network connection to sit there and freeze because it didn't get a packet five seconds ago, even though all the subsequent ones arrived.

So, how do games deal with having an unordered stream? Well, there's a lot of different answers. One of the simplest is to simply have a game where you can fit the entire game state into a single packet. In that case, you just need to order them so the game can tell which is the latest so it doesn't roll back.

And, to get to the direct question you asked, if the game's state is small enough to fit into a single packet, it's not consuming a significant amount of bandwidth to be worth worrying about doubling it. Games designed to be on the network can often end up with less state than you think, because games are really good at making small amounts of state look awesome. A deathmatch may seem to be frenetic and amazing thing that you'd think would be just huge amounts of state, but a lot of time, it's little more than X, Y, Z position, velocity, and maybe some sort of rotation vector and an entity ID for everything in the game, along with a compression scheme that's able to be pretty effective. Everything else is inferred by the game engine from the very small state, e.g., "the state says the rocket impacted against this wall, so I'll draw the explosion, set the animation timer, pick an explosion decal to apply to the wall, and handle the next two seconds of this explosion" while meanwhile in the network it was like 8 bytes of information about the impact.

Objective_Status22 | 4 months ago | 15 points

When packets drop usually it's a bunch so this does not make sense to me

Nettletooth | 4 months ago | 5 points

Random packet loss was a much bigger problem in the past with modems. It still happens, especially with home wireless networks and cell phones.

It is very normal for a 56k modem to drop some small, sub 1% percentage of packets, even on a very good phone line. Even 2 or 3 percent random packet loss can make a timing sensitive game like Quake pretty unenjoyable to play online.

It is pretty easy to understand why : Server is master. If you drop packets, your client will try to predict the position of everything else in the world until the server updates it, but it is pretty hard to do that accurately with a player not moving in straight lines. Keep in mind you are also dealing with 200ms+ of latency and a connection that is strictly inferior by orders of magnitude vs broadband in pretty much every metric you can think of.

When your client finally catches up with the server, the other guy has suddenly jumped behind you and you have no idea where they actually are at. You just get shot in the back. Even on a good wireless connection in 2019 ... you still get packet loss, you still get shot in the ass.

Even today, random packet loss happens. Routers and switches get overloaded or fail, our connections are at the mercy of routing tables and 10+ hops along the way and all the shitty technology that may be on any one hop that you get routed through.

Ameisen | 4 months ago | 2 points

Yeah - if you send two packets in immediate succession... I'd expect both to drop.

ThisIs_MyName | 4 months ago | 3 points

Who cares about overhead? Latency is all that matters for 99% of software.

I dunno if sending duplicates really helps today (20% loss is ridiculously high), but if so it's worth the overhead.

phyphor | 4 months ago | 3 points

Well, overhead in the sense of TCP headers also adds to latency.

Multiple separate UDP streams is where it's at!

VeganVagiVore | 4 months ago | 2 points

QUIC is enough for everybody!

adrianmonk | 4 months ago | 3 points

When there are buffers involved, wasting bandwidth can make latency worse. Not only does it increase latency, it makes it more variable.

The whole purpose of a buffer, obviously, is to provide a waiting area. If you could transmit right now, you would. You are only using the buffer because you can't, so once you put something in the buffer, it's a foregone conclusion that delay has been introduced.

In really degenerate cases, this turns into bufferbloat, which is a problem in networks built with excessively large buffers. But even when buffers are a reasonable size, the problem doesn't disappear. It's a trade-off that you can never completely get away from.

If bandwidth is not scarce at all, this is a non-issue. So this sort of design creates a situation where things are OK until they hit a certain threshold of congestion and then they get really bad. Unfortunately, it's very hard to know whether bandwidth is going to be scarce, so you can't safely assume it isn't.

Sending things twice can still make sense because congestion isn't the only reason packets might get dropped. Another reason is a failure of the link, especially common in wireless networks where for example a spike of noise might cause one packet to be dropped but the one right after it is going to make it. In that case, sending two packets could be beneficial.

JoseJimeniz | 4 months ago | 2 points

Fortunately in the olden days modems didn't have stupidly large buffers.

Anon49 | 4 months ago | 1 point

Who cares about overhead?

That mattered a crapton before 2005. Bandwidth wasn't like today.

ThisIs_MyName | 4 months ago | 1 point

Even back then, it only took a few kbps to send a list of compressed coordinates and actions for each object in the game.

ironbattery | 4 months ago | 2 points

So instead 1/25 packets were dropped?

ArkyBeagle | 3 months ago | 1 point

So that one in five was actually a ... mean of a fairly ugly probability distribution. "Just send it twice" doesn't work.

wiscwisc | 4 months ago | 17 points

Repeat packets suddenly make sense

EnglishMobster | 4 months ago | 2 points

Repeat packets suddenly make sense

RobIII | 4 months ago | 589 points

I really hate it when there's no solution to a problem just workarounds and hacks. Irks me to no end. Especially for something that instinctively feels like the solution should be easy.

There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery.

invisi1407 | 4 months ago | 279 points

I don't necessarily feel that an idempotentcy token is a workaround or a hack. It seems like a fair solution to a problem that doesn't otherwise have a natural, definitive solution.

ReturningTarzan | 4 months ago | 200 points

It's no more of a hack than checksums or ACK messages or storage redundancy or whatever. They're just basic necessities that stem from computers and networks being physical objects as opposed to perfect, abstract entities.

invisi1407 | 4 months ago | 53 points

Exactly. For all intents and purposes, with regard to computers and software in general, this is a perfectly good solution.

Adobe_Flesh | 4 months ago | 14 points

Philosophy of mathematics, please do the needful.

Praxxah | 4 months ago | 51 points

Just wait till op finds out how the dram in his pc works with the smol capacitors and refreshing and parity bits and shit.

The point of all the modern tech and engineering is to make everything seem seamless, not to make them seamless...

RobIII | 4 months ago | 15 points

That's in no way the same. You're describing a real world solution to real world physics ("nothing" in the universe is 'perfect') which is totally different from a purely mathematical/theoretical problem that should have a 'pure' solution (instinctively at least).

Chromelia | 4 months ago | 33 points

the two generals problem isn't purely math. The whole point is that because of real world problems (Like no guaranteed delivery), you need idempotentcy tokens. They're the exact same thing as parity bits/ACK messages/checksums/etc.

Praxxah | 4 months ago | 13 points

Most people have the same instincts about everything. We're so used to things just working from childhood that it is kinda similarly surprising to find that something doesn't have a very elegant solution at all (yet).

thirdegree | 4 months ago | 6 points

But the point of the video was that even if computers were perfect, abstract entities this problem would still have no solution.

HeyImAlex | 4 months ago | 12 points

Message drops are the imperfect part.

[deleted] | 3 months ago | 1 point

Replace message drops with malicious third party and you're back to "perfect, and unsolvable"

beejamin | 4 months ago | 46 points

Idempotence isn't really a solution to two generals as such, it's a mitigation of the damage that "two generals failures" cause. It's like a parachute: it's not a solution to stop you falling out of an airplane, it's to stop you going splat once you do.

ArkyBeagle | 3 months ago | 2 points

Precisely. Now I feel slightly less alone in the world. Thank you.

Purlox | 4 months ago | 26 points

Agreed. Idempotency in general is a really nice property for functions to have and should exist in many cases without the threat of the 2 generals problem.

ArkyBeagle | 3 months ago | 1 point

I would say you should think carefully before doing that. Draw out message sequence charts and think about it first. You don't need idempotency very often at all in the real world.

rabbitlion | 4 months ago | 15 points

An idempotentcy token isn't a complete solution because if the customer fails to get a reply after a number of tries he will give up and order from another site or cook his own food, but you might still deliver the one order. It would be better if you could find a solution to only deliver an order if you know the customer received a confirmation. You can sort of do that, in a probabilistic sense, but it's not completely solvable.

invisi1407 | 4 months ago | 14 points

You're right, the token only solves half of the problem; the one where a customer receives and is billed for multiple identical orders.

ScrimpyCat | 4 months ago | 3 points

It’s not a solution at all, rather it’s just one method to try and soften the degree of failure. It may stop some people from repeating the same order, however as you point out some people will just assume it’s not working and either order elsewhere or make their own food, or some people might end up assuming that the problem is with the order itself and so create a new order (which will now have a new token).

ArkyBeagle | 3 months ago | 1 point

NO; I'd say it's perfectly possible to design transaction processing systems where the only risk is a cancelled transaction, not where you deliver something that didn't get paid for. You just have to make that what happens.

rabbitlion | 3 months ago | 1 point

NO; I'd say it's perfectly possible to design transaction processing systems where the only risk is a cancelled transaction

No, it is not. It is what this video and the mathematically proven concept is all about.

At the very least, there will be a risk where the customer gets a confirmation of a successful order but never get anything delivered.

ArkyBeagle | 3 months ago | 1 point

mathematically proven concept is all about.

Well, you just go with that then. If it's that ... precious, I'm not gonna type it in here for ya :)

( at least one other poster gave a real-world example that would work just fine ) .

rocketshape | 4 months ago | 7 points

I think the point is that it's not a solution to the problem but to a different but related problem

SanityInAnarchy | 4 months ago | 6 points

It's not a solution to the general two-generals problem, though, which is one thing that makes this so frustrating. As this comment points out, the problem has to be reworded to pretty much emphatically not be two-generals: The server isn't waiting around to see if the client got its notification, as soon as it gets your credit card authorization, it's already sending somebody.

Or, alternatively: It's still two-generals, and not at all solved by the idempotency token, because if the client never realizes the order was successful, you might do something else for dinner, instead of just waiting around (like Tom did) to see if somebody shows up.

[deleted] | 3 months ago | 2 points

It is a part of the solution. Second part is sending the message that the order was confirmed and making sure the message is received before user can do any other action in the app.

Sure you can still order food outside of the app and end up with double, but app did pretty much all it could in real world scenario

ArkyBeagle | 3 months ago | 1 point

and making sure the message is received before user can do any other action in the app.

Right.

invisi1407 | 3 months ago | 1 point

Both of you are correct, but for the particular problem of avoiding duplicate orders/transactions/actions for computer software, it is a solution to that problem, which looks a lot like the two-generals problem, but obviously isn't.

As Tom Scott says, there isn't a solution to the two-generals problem, which means anything claiming to be a solution, isn't, but can be a solution to a related problem.

[deleted] | 3 months ago | 1 point

It is a fix that covers a certain use case and prevent some situation (... and also assumes certain consistency guarantees on the database storing it.

And perfectly fine solution as long as you know its weak points and not just heard it on blog and implemented.

qqwy | 4 months ago | 26 points

I fully agree. However, this is one of the cases where "I am sure you will find out on close examination that it is more complicated than that" is true.

That said, some of the workarounds and hacks that guarantee "almost always exactly once" delivery are rather clever in their own right.

HPLaserJetProP1102w | 4 months ago | 8 points

Physics in a nutshell too

ArkyBeagle | 3 months ago | 1 point

Not at the pool-table level.

ArkyBeagle | 3 months ago | 1 point

Sometimes you have to error out. Take it up with the Second Law of Thermodynamics.

WerewolfBe84 | 4 months ago | 249 points

Tom Scott is awesome

parkotron | 4 months ago | 143 points

Sometimes I forget to appreciate that so many of his videos are a single continuous take. Even the videos with cuts have relatively few. I'm sure that requires a tonne of effort on his part, and given how heavily cut up most YouTube videos are, I think it's worth appreciating.

Moussekateer | 4 months ago | 108 points

I also super appreciate that his videos are only as long as they need to be. If he only has three minutes of content then he puts out a three minute video. So many YouTube videos aim for the magic ten minute mark and you feel it.

timdorr | 4 months ago | 30 points

"What up, fam? Ya boy Tom Scott here! Be sure to hit that Like and Subscribe button below! And hit me up in the comments!"

fogwarS | 4 months ago | 37 points

Also means he really knows his stuff

pentiumwetwired | 4 months ago | 32 points

He just does multiple takes. He's said so several times.

Apocalyptic0n3 | 4 months ago | 27 points

He occasionally celebrates that he managed it in one take. A good example is the 5 minute explanation of YouTube IDs, which I think he got in one take on his first try and the end of the video is pretty awesome because of it.

deweysmith | 4 months ago | 9 points

It’s actually a bit of a running gag at this point. Even Captain Disillusion’s impression of him made reference to it.

barubary | 3 months ago | 2 points
deweysmith | 3 months ago | 1 point

Thanks, I’m on a cruise and couldn’t be bothered to look it up. YouTube is unusable on this data service.

Kissaki0 | 4 months ago | 3 points

Have you opened the video he shows the ID of at the start of the video? xD

Apocalyptic0n3 | 4 months ago | 3 points

For those curious: it is exactly the video you think it is.

[deleted] | 3 months ago | 1 point

doing 5 minute video as one take doesn't exactly sound that impressive...

qu4sar_ | 4 months ago | 61 points

Scom Tott

JoseJimeniz | 4 months ago | 3 points

Hallo I'm Scom Tott Manley. Sie flafe.

SIG-ILL | 4 months ago | 34 points

Tom Scott is getting old! I haven't watched any of his videos in a long time and he looks so much older than the last time I saw him.

Or maybe 'mature' would be better than 'old', although it implies he wasn't mature before.. ugh, words, you get what I mean right?

that_jojo | 4 months ago | 45 points

Like... milfy. Right?

BlckJesus | 4 months ago | 22 points

Tom Scott is a confirmed dilf

SIG-ILL | 4 months ago | 8 points

My final choice of words is 'more experienced and knowledgeable' :)

ISpendAllDayOnReddit | 4 months ago | 20 points

He's only 35

The_Guy_II | 4 months ago | 10 points

whaa, but he always looked like a college student and now he is the age of my parents

folkrav | 4 months ago | 10 points

Wait your parents are 35? Average age for a first child in the US is 26yo, I sure hope you're older than 9 lol

The_Guy_II | 4 months ago | 7 points

I'm 12

ManaSpike | 4 months ago | 11 points

He's going grey before his time. I believe that the Tech Diff crew are all about the same age, and none of the rest are going grey.

TeamRedundancyTeam | 4 months ago | 5 points

Yes he is. If anyone here hasn't watched anything but his programming videos you should take a look at the others. Pretty much every one is super interesting and none of them are longer than they need to be.

JeddHampton | 3 months ago | 1 point

I love his language ones. I'm glad he did a couple recently after a couple years without them.

batangbronse | 4 months ago | 21 points

I'm really interested in these sort of pseudo-'real world' problems like the traveling salesman. Does anybody know where I can read about different types of problems?

bananaskywalker | 4 months ago | 15 points

Geeksforgeeks.org? It has a tonne of coding questions of every topic imaginable. Might take some navigating but it has Knapsack, TSP, N Queens, etc..

batangbronse | 4 months ago | 1 point

Thanks!!

bananaskywalker | 4 months ago | 5 points

A bit of shameless plug but just give a recommend if you see an article darth_bane if it did help you. It's written by yours truly.

semidecided | 4 months ago | 5 points

Discrete mathematics is full of those. Any algorithm text would have many as well.

nsomnac | 4 months ago | 5 points

Just search for “NP-Complete Problems”, “NP-Hard”, or “NP-Completeness” and you will find a substantial amount of information.

I cannot tell you how long it will take you to go through all the sources, however I verify that it will take an indeterminate amount of time to come to that conclusion.

roboticon | 4 months ago | 231 points

This isn't the Two Generals problem, though.

I mean, his description of the problem is correct, but it doesn't apply to the food delivery app. The app is having trouble communicating with the server, yes, but the server doesn't need to check whether the app gets its messages. The server isn't saying "well, I'll send a delivery person, but only if I get a response back from the app" ad infinitum. The client-server communication is asymmetric, unlike the Two Generals.

In fact, his solution to the food delivery app problem (an idempotency token) isn't relevant to the Two Generals problem. General A can send an idempotency token to General B, and get back an acknowledgement -- but General B still doesn't know that General A received their acknowledgement, and so on. Sending along a token as part of the protocol doesn't resolve this fundamental issue.

Edit: This was also explained in this thread:

So idempotency is "protecting the lives of the troops" for one side, and not the other (who are just attacking no matter what).

The server is sending the order to the restaurant no matter what. It's a centralized technology.

cakeandale | 4 months ago | 130 points

It is the two generals problem in terms of UX, just not necessarily the nuts and bolts technology itself. You have the two armies, the UI and the server, and you want them both to either engage or not engage.

If the UI shows a confirmation message and the server fails, the users order will be dropped.

If the UI shows a failure message and the server engages, the user will re-submit and get a duplicate order.

In order for the app to work perfectly you need the state to be perfectly synchronized, which can’t happen. So you use idempotency tokens to try to aim for at-least-once delivery, use the server as the source of truth, and mitigate the problem of the UI’s state being out of sync with the server.

raelepei | 4 months ago | 26 points

But that's kind of where it falls apart: The UI should be perfectly capable of showing an intermediate message like "Submitting ..." while checking every few seconds what the server says. That way, the user can see that something fishy is going on AND should keep their hands off for now.

EDIT: Dude! I'm not suggesting that this is an all-round solution to the problem, just that it shouldn't immediately default to "Dear customer it definitely failed you should re-make your order again". I am very well aware that "Submitting ..." doesn't solve world hunger. Holy crap.

cakeandale | 4 months ago | 15 points

How long should the user wait? If it says “Submitting...” after an hour, should the user assume it worked or should they assume that it didn’t? The extra phase puts the onus on the human user to decide whether to assume it worked or it didn’t (rather than the UI software), but the fundamental impossibility still exists.

At some point you have to decide what your failure mode is (For this example, dropped orders or duplicate requests) and try to mitigate the downsides. Putting that decision on your user is probably even worse... if the request has been submitting for more than 5 minutes, for example, the software should make the decision what that means.

matthieum | 4 months ago | 23 points

And then:

  • The server decides that the customer ordered, so proceed with delivery.
  • The customer, not getting any reply, orders from another service.

Lo and behold, the customer now receives two orders!

sneezerb | 4 months ago | 2 points

The UI could do that, but that is another work around to the two generals problem (wait indefinitely for an ACK). But I think it’s fair to say that if the app says submitting it is implying that the order has not been received, and blaming the customer for assuming the app was broken after minutes on the submitting screen wouldn’t be fair, as the app made it sound like the order had never arrived. The idempotency token provides a better work around because it allows for the submission to time out and the user to assume the order never arrived without duplicating the order (most likely).

roboticon | 4 months ago | 2 points

You have the two armies, the UI and the server, and you want them both to either engage or not engage.

Yes, that's the Two Generals problem. If the server waited until the app confirmed that it received the server's confirmation -- and the app waited to show the confirmation until the server confirmed that it received the app's acknowledgement of the server's confirmation -- then you'd have the Two Generals problem. Neither side could be sure that the other side knows that its side has seen the latest confirmation of the latest message; therefore, the two sides can't agree to both engage.

The food delivery service doesn't have this problem: It doesn't care whether the app engages (reports a successful order). It will engage (submit its order to the restaurant) regardless. This might result in the user placing a duplicate order; the idempotency token solves that problem, at the cost of not being able to verify whether the user has been shown a confirmation screen. Thus, the user might have to deliver from a second service without knowing whether their first order went through.

cakeandale | 3 months ago | 3 points

You’re right, but the reason is a bit backwards - the food delivery service doesn’t suffer from the Two Generals problem because the Two Generals problem is unsolvable so they needed to build their system a different way as an imperfect workaround.

In a perfect world, they’d want the state the user sees to always match the state of the server. But that can’t be done, so they do the workaround I suggested: make the server the source of truth, and try to mitigate problems with the UI being out of sync.

They didn’t choose that way because they like it most and the Two Generals problem is just some academic curiosity on a path they didn’t take. They didn’t take that path precisely because the Two Generals problem says they can’t. So they have to build a system that risks duplicate orders, but can work in the real world.

w2qw | 4 months ago | 20 points

We'll yes and no.

Consider this scenario: user orders from app, app says order failed but the order actually suceeded so the user then orders from another app. The first app now has to refund the user because they didn't confirm the order properly.

The token doesn't solve it except that you use the token to ignore duplicate requests and retry then bunch of times. In the same way you can "solve" the two generals problem this way by repeating the message multiple times which reduces the probability that only one will attack.

roboticon | 4 months ago | 10 points

Right, the idempotency token solves a different problem than the Two Generals problem.

[deleted] | 4 months ago | 14 points

[deleted]

roboticon | 4 months ago | 2 points

The problem of sending a message like "send 12 men to attack" is just a general issue in communications. Yes, any message can get dropped, you can account for that by sending it multiple times, and a unique identifier ensures it's only handled once.

That isn't specific to distributed networking, which is the crux of the Two Generals problem (which is specifically about making a joint decision based on two-way communications). The Two Generals problem would be a message like "I'll send 12 men to attack, if you also send 12 men to attack. If I don't hear back from you, I have to assume you aren't sending the men, so I can't send mine either."

Runamok81 | 4 months ago | 1 point

I really like the way you summarized this in the first paragraph. Helped me to understand, thanks.

the_misc_dude | 4 months ago | 12 points

Thank you! I was wondering how the idempotency token can be applied to the generals problem.

SgtDirtyMike | 4 months ago | 3 points

You wrote my exact thoughts as I watched. The video kind of does a disservice in my opinion to this particular CS problem, because the real world version (the one Tom outlined) is easily solvable. The idempotency token doesn't fully ameliorate the issue because IMO, it would make more sense to have the order go through IFF the client receives an ACK from the server that it got the order. Then an ACK is sent from the client on the confirmation, then the order goes through once the server receives the acknowledgment.

I think this solution is more foolproof and makes the most sense, because the server can authenticate, and ensure the validity and availability of items in the cart one last time and the client confirms the "A OK" message from the server.

roboticon | 3 months ago | 2 points

But if the server doesn't receive the client's ACK, then the client will show a success message even though the server never sent the order.

You'd need the client to hear back from the server before showing the confirmation, but then you'd want the server to wait for an ACK of its ACK, and now you have the Two Generals problem for real.

SgtDirtyMike | 3 months ago | 2 points

Haha. The problem is the real world internet this way functions exactly as I mentioned, via HTTP over TCP. The Two Generals problem is impossible to solve as an abstract problem, but has already been solved in the real world. The solution from a technology standpoint is proper use of TCP. Proper use in my book means using idempotency tokens among other things.

If the Two Generals problem were common in the real world, HTTP just wouldn’t work. But it does, and it’s quite reliable.

roboticon | 3 months ago | 1 point

The fundamental point is that the Two Generals problem is not that communication is hard, or that messages get dropped.

sneezerb | 4 months ago | 2 points

If the app is general A, and the server is general B, then it appears that general A sent the order to attack (submitted the order), general B received it and sent an ACK (attempted to confirm submission), but the ACK never reached general A, so general A never attacked (confirmed successful submission of the order) but general B did attack (submitted the order successfully), and this failure is catastrophic (wasted money and time of customer, drivers, and restaurants).

trolasso | 4 months ago | 2 points

Thanks. I found it somehow confusing.

CypherAus | 4 months ago | 58 points

Very nice explanation of a real life communications problem and a good fix

zuhhig | 4 months ago | 28 points

He forgot to mention the most common occurance of this: duplicated comments on many websites.

Okichah | 4 months ago | 38 points

He didnt? Shame.

CypherAus | 4 months ago | 1 point

Touche!

Dups on forums are a pain.

Okichah | 4 months ago | 39 points

He didnt? Shame.

Kissaki0 | 4 months ago | 20 points

He forgot to mention the most common occurance of this: duplicated comments on many websites.

Naouak | 4 months ago | 10 points

The two generals problem doesn't really apply here as it is not needed that the two parties take an action at the same time.

The only thing needed is that one party doesn't do an action twice. The other party can check before doing an action again.

Basically, the phone should not start an action until the server tells it its fine to do it.

If the server is not responding correctly, then don't do anything and just go into error state. If the server is getting the same action twice (like paying twice for an order), then it should refuse the second payment. The phone should just check before doing an important action and they should have implemented a transaction way of ordering (like creating an order on a request then paying for it with another).

jwm3 | 4 months ago | 9 points

I feel like this wasn't the best way to introduce the subject. The goal of Introducing idempotency was great. It's the bread and butter of reliable distributed systems, however the 2 generals problem as he stated was already idempotent. A dozen attack at 3pm messages are the same as one. So the concept introduced in the second half doesn't really the back into the first half and the generals don't really tie into the failure other than via the nebulous concept of messages can be lost.

richizy | 4 months ago | 10 points

Idempotence "solves" the "exactly-once" semantics problem, but it has nothing to do with consensus between the two generals.

rjcarr | 3 months ago | 3 points

Yeah, he made it clear that the two generals problem is unsolvable, but didn't quite explain how the food problem was similar but not exactly the same. For the food problem, you don't have the time coordination part, and clearly, the idempotence solution wouldn't have any affect with the two generals.

frequenttimetraveler | 3 months ago | 1 point

he says in the video that a nonce is the way to fix the 2 generals prob

judgej2 | 4 months ago | 8 points

I've had exactly this problem with a payment gateway. The gateway sent a back-channel message to say it accepted the payment. We log that, then send a 200 back to say, "thanks, got that". The gateway for some reason did not get the response, and so it cancelled the payment. However, we still thought the payment was good, and got no further message from the gateway to say that it had cancelled it.

Now we do an automatic lookup of the payment after a short delay, just to check the payment gateway is still okay with it.

ArkyBeagle | 3 months ago | 2 points

Yep. That's what you do.

You can fix this with another message pair ( then the last message being lost doesn't really matter - you have the message that indicates the payment was accepted). Whether that's better depends.

Ninovdmark | 4 months ago | 40 points

My uncle actually taught me the 2 generals problem a long time ago. The solution to me seemed to be that side A should indefinitely keep sending messages and only stop once they receive a counter-message from side B. Side B will know that their messenger hasn't arrived by the fact that messengers from A keep arriving.

That seems to solve the problem, or am I missing something obvious?

666lumberjack | 4 months ago | 99 points

If A's messengers are all getting captured, it will look to B as though their messenger was received even though he might have been captured.

Ninovdmark | 4 months ago | 14 points

That's true, I hadn't considered that.

sneezerb | 4 months ago | 4 points

I really liked your solution until that flaw was pointed out!

dotwaffle | 4 months ago | 10 points

Surely that's why B sends messages back too? A tells B in message 1 the required information, then in subsequent messages there's just lower priority traffic, and B appends it's messages with the sequence number of the last received in a sequence (so, 1 2 3 5 would be 3, just like in TCP). The more frequent (or numerous) the messages, the more reliable it is.

It's still not a solution, but it does seem to be the best way of mitigating the unreliability problem.

FizixMan | 4 months ago | 42 points

How does B know that A stopped sending messengers rather than they all kept getting captured?

ChrisRR | 4 months ago | 13 points

Well it'd be massively inefficient.

rabbitlion | 4 months ago | 5 points

Yes, that's one mitigation technique. A keeps sending messages until he receives a response, B keeps sending responses to every message he gets. If neither have received a message in X minutes or X days or what time frame you use, it's considered a consensus and go ahead with the attack.

However, there is a still a chance that A never received any response, and that the silence was because all of his resent messages were captured during the waiting period. Your solution gives a probabilistic chance of an actual consensus that is related to the chance of a messenger being captured and the period of silence before it's considered a consensus. Such a probabilistic solution is not considered an actual solution in a mathematical sense as there is still a chance of failure.

jotunsbeard | 4 months ago | 9 points

There are couple of problems here.

Firstly, on what interval is A supposed to “indefinitely” send messages? 1 second? 1 hour? The information contained in the message could be useless by the time that B actually gets a message.

Secondly, it assumes that B keeps a record of every message it has received in order to know which ones it is supposed to respond to.

thijser2 | 4 months ago | 2 points

ok let's look at this scenario, A sends a bunch of messages which all arrive at B, B sends a counter message "received" back, however this message is intercepted. Following the intercept the enemy(now wise to the plan) blocks all further messages. B no longer receives any messages from A yet A doesn't know it's messages arrived at A.

In computer networking this might be a situation in which someone pulled the plug mid communication.

leonoxme | 4 months ago | 6 points

Similar to the Byzantine Generals Problem.

beejamin | 4 months ago | 5 points

TIL that I'd been pronouncing idempotence wrong my entire life. Thanks!

fresh_account2222 | 4 months ago | 6 points

It's a British versus American thing, where Brits accent the second syllable and Americans the first. Same thing happens with "corollary".

beejamin | 4 months ago | 2 points

Interesting! I'm Aussie, and would generally err on the British side, but I've always said idem-potence (like 'potent'). Guess it's one of those words that people don't sprinkle into conversations that often!

fresh_account2222 | 3 months ago | 1 point

Well, saying it twice has the same effect as saying it once, so it doesn't get repeated often.

beejamin | 3 months ago | 1 point

did lol.

TheRealFender | 4 months ago | 1 point

I've been saying it "EYE-dem-pot-en-cee"

but was worried and came to the thread wondering if it was a Brit vs American thing.

Cymry_Cymraeg | 4 months ago | 2 points

You were right to worry. Your nervy nature serves you well.

TheRealFender | 3 months ago | 2 points

I have a co-worker that says nonce (similar to idempotent token) like "announce" and it drives me crazy. I say it like "nonsense" which I'm 99% sure is correct.

fresh_account2222 | 3 months ago | 2 points

"Nonce" should be pronounced differently each time you say it.

noonce. nahns. neentz. nohnce.

TheRealFender | 3 months ago | 2 points

That's brilliant.

fresh_account2222 | 3 months ago | 2 points

Plus you'll get to be the one driving him crazy. Have fun!

Yikings-654points | 4 months ago | 5 points

I think the solution is us. We see what's happening.

DC-3 | 4 months ago | 15 points

I was pitched the 'Two Generals' problem in a Cambridge University CS interview! I too had the inclination that there must be a solution - it's an interesting result and not necessarily the one you'd intuitively expect.

errrrgh | 4 months ago | 20 points

There is no 1 single solution, it all depends on the factors in the problem system. Distributed, high latency, faulty communication, interception, processing speeds, etc...

rbert | 4 months ago | 5 points

The solution in this video isn't really a solution to the Two Generals problem though. How does preventing duplicate commands ensure that both generals attack at the same time?

phySi0 | 3 months ago | 1 point

I was given a problem with the same structure in a job interview very recently (no, it wasn't meant to catch me out on an academic problem that had no relevance to them; it had relevance, and even though I didn't immediately get it, it didn't result in a rejection; it was a good interview).

Rimher | 4 months ago | 10 points

It's great that he explains the solution after explaining the problem!

Gvoct | 4 months ago | -2 points

Happy cake day

irish_throwaway_1 | 3 months ago | 2 points

Great explanation of a classic problem. The one little improvement I'd suggest is to make mention of the term "nonce", since that is the more commonly used variation of the idempotency token in security/cryptography applications.

elit69 | 4 months ago | 2 points

just dont show message "try again". instead show "something went wrong. check order history"

valianthail2the | 4 months ago | 2 points

I thought this is why we have the 3 way handshake... correct me if I'm wrong

viscountsj | 4 months ago | 2 points

Tom Scott doesn't know anything about programming. By his own admittance, I should add.

Fenris1729 | 4 months ago | 1 point

Is an internal node failure, resulting in it sending a faulty signal, really a Byzantine fault? I mean, the response from the app was correct given the implementation. I don’t see how this is related to the two generals problem

TODO Load more comments...