A few years ago I designed a way to detect bit-flips in Firefox crash reports and last year we deployed an actual memory tester that runs on user machines after the browser crashes. Today I was looking at the data that comes out of these tests and now I'm 100% positive that the heuristic is sound and a lot of the crashes we see are from users with bad memory or similarly flaky hardware. Here's a few numbers to give you an idea of how large the problem is. 1/5
In the last week we received ~470000 crash reports, these do not represent all crashes because it's an opt-in system, the real number of crashes will be several times larger. Still, out of these ~25000 crashes have been detected as having a potential bit-flip. That's one crash every twenty potentially caused by bad/flaky memory, it's huge! And because it's a conservative heuristic we're underestimating the real number, it's probably going to be at least twice as much. 2/5
In other words up to 10% of all the crashes Firefox users see are not software bugs, they're caused by hardware defects! If I subtract crashes that are caused by resource exhaustion (such as out-of-memory crashes) this number goes up to around 15%. This is a bit skewed because users with flaky hardware will crash more often than users with functioning machines, but even then this dwarfs all the previous estimates I saw regarding this problem. 3/5
And to reinforce this estimate I've looked at the numbers we got from the users who run the memory tester after having experienced a crash: for every two crashes we think are caused by a bit-flip the memory tester found one genuine hardware issue. Keep in mind that this is not doing an extensive test of all the machine's RAM, it only checks up to 1 GiB of memory and runs for no longer than 3 seconds... and it has found lots of real issues! 4/5
And for the record I'm looking at this mostly on computers and phones, but this affects every device. Routers, printers, etc... you name it. That fancy ARM-based MacBook with RAM soldered on the CPU package? We've got plenty of crashes from those, good luck replacing that RAM without super-specialized equipment and an extraordinarily talented technician doing the job. 5/5