← Back to context

Comment by MichaelNolan

2 days ago

> 95% of Uber engineers now use AI tools monthly with 70% of committed code originating from AI.

Well, that’s to be expected when using AI tools becomes relevant in your performance evaluation.

It's actually incredible the extent to which non devs imposing KPIs on devs underestimate how badly this will get gamed, whether it's AIs, PR/line counting or whatever.

  • Gaming is one thing, fundamentally not understanding how engineering works will lead to shittier outcomes and cost the company in ways the management will never understand.

    Management in the age of AI is falling for the doorman fallacy wrt engineering. If lines of code were the most valuable aspect of software engineering, my front end JavaScript intern would’ve been the most valuable person in the company. https://www.jaakkoj.com/concepts/doorman-fallacy

    • >Gaming is one thing, fundamentally not understanding how engineering works will lead to shittier outcomes and cost the company in ways the management will never understand.

      That means nothing to them: they jump ship and find another job just like devs do. The whole industry has been musical chairs for a while.

  • Exactly. At Cerebras I know of several people who burn tokens on completely USELESS tasks (randomly changing pixels in an image) just to keep them high up on the token leaderboard.

    I suspect the other tokenboard leaders are doing the same. They made the metric "token usage" (which is just a proxy for LOC) so that's what they're gonna get.

  • Someone at my job uses AI tools to reformat his code...

    • I actually do this, but that's mostly because our team reviewed all the existing autoformatters for the relatively obscure language we use, and either really hated the formatting or found that they actually introduced errors!

  • I think PRs is pretty good, IF

    1. you sample a few to see that they are actually meaningful,

    2. they go to prod and are validated without having to roll back.

    Still needs to be managed. But it should be much easier for a manager to catch an engineer gaming PRs than something like AI use or lines of code.

When managers and VPs all say, you must use AI or else you will not work here, then yes, people will use it.

yeah and once the KPI is "how much AI did you use" instead of "what did you ship," the budget blowout writes itself. people will game the number.

I don't understand this critique. (1) Did you previously think you weren't getting paid for doing what a company wants you to do, aka what THEY thought was productive? (2) Do you think all this AI generated code is useless?

Edit: y'all are some whiney folk, ain't ya?

  • I think the point was that, when you make a metric goal of "you must use AI this much", then people will use AI even in ways that isn't adding to productivity.

  • To answer your second question: Yes, much of it is worse than useless. The tools need guidance to produce useful output. If you use it poorly, you will get garbage output that may do more harm than good.

    And your response does not address the point being made in the comment you replied to: Many people are being evaluated by how many tokens they burn, which is about as good a metric as lines of code written.

  • 1) I think if the company I work for spends too much effort on things that aren't going to make money, they won't be able to pay me anymore, no matter what they "think" is productive. That's not how executives at companies like this make decisions, though.

    2) Mostly, yes.

  • > (1) ...getting paid for doing what a company wants you to do...?

    At my previous company, when the thing they thought they wanted me to do (which was not the thing they actually wanted... but whatever) diverged from my values I quit. You can just do things.

    > (2) Do you think all this AI generated code is useless?

    Almost universally, yes. Especially in organizations that historically haven't been particularly careful about hiring and have a huge number of young, inexperienced people. There are exceptions but they're rare enough that throwing that particular baby out with the bathwater isn't a big loss.

  • I think parent is saying "% of code being generated by AI" is not a generally good, direct metric for business value. It's akin to the "we are pushing SO MUCH CODE" phase of early ai marketing.

    If we're trying to measure the value of adopting tool, it's probably better to measure the ROI of that tool rather than the usage % of that tool, especially when usage is basically mandated.

    To directly answer your questions:

    1. You're being paid to create value for the business, which "doing what they think is productive" is a proxy for. You're not being paid to use a tool a high % of the time.

    2. I doesn't seem like parent even commented on the quality of the code generated. I think anyone that uses it regularly can agree that: a) the code is not useless and b) all generated code is not immediately production ready c ) AI generation of code is an accelerant for software development

  • Goodhart's Law isn't a problem immediately. If you want more code to be written, and the only feasible way to write it to goals is to heavily use AI, then you might run into the problems of AI-generated code, and an infrastructure that's poorly architected and much less understood than it would've been ten years ago.

  • Not OP, but:

    1. At my level, the company is not just paying me to do a task the way they want it done, they are paying for my experience to orchestrate the best way to do it. They want an outcome, and I'm responsible for figuring out how to get to that outcome with the right balance of cost, correctness, etc. But yes, the most dystopian reality is what you said.

    2. It's not useless, but the AI generated code is absolutely lower quality than what I would have written myself, but there is no desire to clean it up. Companies have always had a disastrously bad understanding of technical debt and they finally have tool they can shove down developers throats that trades even more velocity for even less quality. They're going to take that trade every single time.

  • GP just saying that any metric will be gamed and if you have some costs that is associated to that, it will grow. Let’s say you set some metric that says the most productive dev are the ones that has the most files changes, you can soon expect every function and structure to be its own file. Same if you say that sales commision are based on how much time you spend calling, expect the phone bills to grow a lot.

  • you're missing their point; LLM use is often a part of your evaluation at some of these larger companies and they expect you to use them heavily or you will get a lashing