Comment by scottlamb
2 days ago
> When Jeff Dean goes on vacation, production services across Google mysteriously stop working within a few days. This is actually true. ... It's not clear whether this fact is really true, or whether this line is simply part of the joke, so I've omitted the usual (TRUE) identifier here. Interpret this as you see fit :)
I think this one's true-ish. Back in the day when Google didn't have good cron services for the corp and production domains [1], Jeff Dean's workstation ran a job that made something called (iirc) the "protocol buffer debug database". Basically, a big file (probably an sstable) with compiled .proto introspection data for a huge number of checked-in protobufs. You could use it to produce human-readable debug output from what was otherwise a fairly indecipherable blob. I don't think it was ever intended for production use, but some things that shouldn't have ended up using it. I think after Jeff had been on vacation for a while, his `prodaccess` credentials expired, the job stopped working, maybe the output became unavailable, and some things broke.
Here's a related story I know is true: when I was running Google Reader, I got paged frequently for Bigtable replication delay, and I eventually traced it to trouble accessing files that shared GFS chunkservers with this database. I mentioned it on some mailing list, and almost immediately afterward Jeff Dean CCed me on a code review changing the file's replication from r=3 to r=12. The problem went away.
[1] this lasted longer than you would expect
Ha, I also recall this fact about the protobuf DB after all these years
Another Jeff Dean fact should be "Russ Cox was Jeff Dean's intern"
This was either 2006 or 2007, whenever Russ started. I remember when Jeff and Sanjay wrote "gsearch", a distributed grep over google3 that ran on 40-80 machines [1].
There was a series of talks called "Nooglers and the PDB" I think, and I remember Jeff explained gsearch to maybe 20-40 of us in a small conference room in building 43.
It was a tiny and elegant piece of code -- something like ~2000 total lines of C++, with "indexer" (I think it just catted all the files, which were later mapped into memory), replicated server, client, and Borg config.
The auth for the indexer lived in Jeff's home dir, perhaps similar to the protobuf DB.
That was some of the first "real Google C++ distributed system" code I read, and it was eye opening.
---
After that talk, I submitted a small CL to that directory (which I think Sanjay balked at slightly, but Jeff accepted). And then I put a Perforce watch on it to see what other changes were being submitted.
I think the code was dormant for awhile, but later I saw someone named Russ Cox started submitting a ton of changes to it. That became the public Google Code Search product [2]. My memory is that Russ wrote something like 30K lines of google3 C++ in a single summer, and then went on to write RE2 (which I later used in Bigtable, etc.)
Much of that work is described here: https://swtch.com/~rsc/regexp/
I remember someone telling him on a mailing list something like "you can't just write your own regex engine; there are too many corner cases in PCRE"
And many people know that Russ Cox went on to be one of the main contributors to the Go language. After the Code Search internship, he worked on Go, which was open sourced in 2009.
---
[1] Actually I wonder if today if this could perform well enough a single machine with 64 or 128 cores. Back then I think the prod machines were something like 2, 4, or 8 cores.
[2] This was the trigram regex search over open source code on the web. Later, there was also the structured search with compiler front ends, led by Steve Yegge.
Side note: I used this query to test LLM recall: Do jeff dean and russ cox know each other?
Interesting results:
1. Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox
This matches my recent experience -- Gemini is enhanced for many use cases by superior recall
2. Claude also knows this, pointing to pages like: https://usesthis.com/interviews/jeff.dean/ - https://goodlisten.co/clip/the-unlikely-friendship-that-shap... (never seen this)
3. ChatGPT did the worst. It said
... they have likely crossed paths professionally given their roles at Google and other tech circles. ...
While I can't confirm if they know each other personally or have worked directly together on projects, they both would have had substantial overlap in their careers at Google.
(edit: I should add I pay for Claude but not Gemini or ChatGPT; this was not a very scientific test)
Not just Google. I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it. That was a year ago. https://news.ycombinator.com/item?id=42649774
2 replies →
I submitted this "fact" and it is indeed a true story, exactly as you said.
The "global protobuf db" had comments all over it saying it's not intended for production-critical tasks, and it had a lot of caveats and gotchas even aside from being built by Jeff's desktop, but it was so convenient that people naturally ended up using it anyway.
There was a variant of this that occurred later. By that time there might not have been a dependency on Jeff's workstation anymore, but the DB, or at least one of its replicas, was getting copied to... /gfs/cg/home/sanjay/ — I don't believe it was Jeff this time. At some point, there was a very long PCR in the Oregon datacenter, perhaps even the same one that happened a few weeks after the 2011 Fukushima disaster. With the CG cluster powered off for multiple days, a bunch of stuff broke, but in this case the issue might have been solved by dumping the data and/or reading it from elsewhere.
In 2010, due to the China hacking thing, Google locked down its network a lot.
At least one production service went down because it relied on a job running on Jeff Dean's personal computer that no longer had access. Unfortunately I forget what job it was.
The other thing that ran under Jeff's desk for a long time was Code Search, the old one.
I remember this. He went on vacation and since he wasn't available to login, code search indexing went down for a bit.