Comment by jesse__
13 hours ago
I've done a handful of interviews recently where the 'scaling' problem involves something that comfortably fits on one machine. The funniest one was ingesting something like 1gb of json per day. I explained, from first principals, how it fits, and received feedback along the lines of "our engineers agreed with your technical assessment, but that's not the answer we wanted, so we're going to pass". I've had this experience a good handful of times.
I think a lot of people don't realize machines come with TBs of RAM and hundreds of physical cores. One machine is fucking huge these days.
I've actually worked on distributed systems that were so broken, I created a script to connect to prod and just create the report from my laptop. My manager offered to buy me a second laptop for running the report since it was easier than getting approval from the architects to get rid of the distributed report system (it only created that one report).
The wildest part is they’ll take those massive machines, shard them into tiny Kubernetes pods, and then engineer something that “scales horizontally” with the number of pods.
Yeah man, you're running on a multitasking OS. Just let the scheduler do the thing.
Yeah this. As I explain many times to people, processes are the only virtualisation you need if you aren’t running a fucked up pile of shit.
The problem we have is fucked up piles of shit not that we don’t have kubernetes and don’t have containers.
6 replies →
Its all fun and games, until the control plane gets killed by the OOMkiller.
Naturally, that detaches all your containers. And theres no seamless reattach for control plane restart.
1 reply →
Until you need to schedule GPUs or other heterogenous compute...
1 reply →
I had to re-read this a few times. I am sad now.
To be fair each of those pods can have dedicated, separate external storage volumes which may actually help and it’s def easier than maintaining 200 iscsi or more whatever targets yourself
I think my brain hurts
I mean, a large part of the point is that you can run on separate physical machines, too.
I recently had to parse 500MB to 2GB daily log files into analytical information for sales. Quick and dirty, the application would of needed 64GB RAM and work laptop only has 48GB RAM. After taking time cleaning it up, it was using under 1GB of RAM and worked faster by only retaining records in RAM if need be between each day.
It is not about what you are doing, it is always about how you do it.
This was the same with doing OCR analysis of assembly and production manuals. Quick and dirty, it would of took over 24 hours of processing time, after moving to semaphores with parallelization it took less than two hours to process all the information.
> It is not about what you are doing, it is always about how you do it.
It saddens me to see how the LinkedIn slop style is expanding to other platforms
In interviews just give them what they are looking for. Don't overthink it. Interviews have gotten so stupidly standardized as the industry at large copied the same Big Tech DSA/System Design/Behavioral process. And therefore interview processes have long been decoupled from the business reality most companies face. Just shard the database and don't forget the API Gateway
> In interviews just give them what they are looking for
Unless, of course, you have multiple options and you don’t want to work for a company that’s looking for dumb stuff in interviews.
100%. Interviews should be a two-way filter. I’m sympathetic to unemployed-and-just-need-something, but also: boy are there a lot of companies hiring data engineers.
Meh .. I've played that game; it doesn't work out well for anyone involved.
I optimize my answers for the companies I want to work for, and get rejected by the ones I don't. The hardest part of that strategy is coming to terms with the idea that I constantly get rejected by people that I think are mostly <derogatory_words_here>, but I've developed thick skin over the years.
I'd much rather spend a year unemployed (and do a ton of painful interviews) and find a company who's values align with mine, than work for a year on a team I disagree with constantly and quit out of frustration.
The company's values may align to yours, even though they reject you. It's because the interview process doesn't need to have anything to do with their real-world process. Their engineers probe you for the same "best practices" that they themselves were constantly probed for in their own interviews. Interviewing is its very own skill that doesn't necessarily translate into real-life performance.
2 replies →
This. Most interviewers don't want to do interviews, they have more important job to do (at least, that's what they claim). So they learn questions and approaches from the same materials and guides that are used by candidates. Well, I'm guilty of doing exactly this a few times.
Meh. as an interviewer I would always make it clear if we wanted to switch to “let’s pretend it doesn’t fit on a machine now”.
Demonstrating competency is always good.
> but that's not the answer we wanted
You could have learned this if you were better about collecting requirements. You can tell the interviewer "I'd do it like this for this size data, but I'd do it like this for 100x data. Which size should I design this for?" If they're looking for one direction and you ask which one, interviewers will tell you.
I've done that too and, in my experience, people that ask a scaling question that fits on a single machine don't have the capacity to have that nuanced conversation. I usually try to help the interviewer adjust the scale to something that actually requires many machines, but they usually don't get it.
Said another way, how do you have a meaningful conversation about scaling with a person who thinks their application is huge, but in reality only requires a tiny fraction of a single machine? Sometimes, there's such a massive gulf between perception and reality that the only thing to do is chuckle and move on.
The burden of wisdom.
Yes, but then how are these people going to justify the money they're spending on cloud systems?... They need to find only reasons to maintain their "investment", otherwise they could be held as incompetent when their solution is proven to be ineffective. So, they have to show that it was a unanimous technical decision to do whatever they wanted in the first place.
> I explained, from first principals, how it fits, and received feedback along the lines of "our engineers agreed with your technical assessment, but that's not the answer we wanted, so we're going to pass". I've had this experience a good handful of times.
Probably a better outcome than being hired onto a team where everyone know you're technically correct but they ignore your suggestions for some mysterious (to you) reason.
Oh, absolutely.
I have a funny story I need to tell some day about how I could get a 4GB JSON loaded purely in the browser at some insane speed, by reading the bytes, identifying the "\n" then making a lookup table. It started low stakes but ended up becoming a multi-million internal project (in man-hours) that virtually everyone on the company used. It's the kind of project that if started "big" from the beginning, I'd bet anything it wouldn't have gotten so far.
Edit: I did try JSON.parse() first, which I expected to fail and it did fail BUT it's important that you try anyway.
Curious about which browser and hardware. In my experience browsers often choke on 0.5GB strings, or decide to kill the tab/proccess.
Every one of these cores is really fast, too!
yeah man, computers are completely bananacakes
Yeah I had this problem at a couple of times in startup interviews where the interviewer asked a question I happened to have expertise in and then disagreed with my answer and clearly they didn't know all that much about it. It's ok, they did me a favor.
It may or may not be related that the places that this happened were always very ethnically monotone with narrow age ranges (nothing against any particular ethnic group, they were all different ethnic monotones)
Hah yeah, that's a funny one, being able to run circles around the interviewer.
This kind of bad interview is rife. It’s often more a case of guess what the interviewer thinks than come up with a good solution.
Yes, yes but how are we going to get HA with one machine..
Fuck off ..you're 10 person startup with an MVP and no revenue stream needs customers first..
1gb of json u can do in one parse ¯\_(ツ)_/¯ big batches are fast
“there’s no wrong answer, we just want to see how you think” gaslighting in tech needs to be studied by the EEOC, Department of Labor, FTC, SEC, and Delaware Chancery Court to name a few
let’s see how they think and turn this into a paid interview