Comment by onlyrealcuzzo

8 days ago

I work at a FAANG.

Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find: such is this team's code that does X?

I am yet to successfully prompt it and get a working commit.

Further, I will add that I also don't know any ICs personally who have successfully used it. Though, there's endless posts of people talking about how they're now 10x more productive, and everyone needs to do x y an z now. I just don't know any of these people.

Non-professionally, it's amazing how well it does on a small greenfield task, and I have seen that 10x improvement in velocity. But, at work, close to 0 so far.

Of the posts I've seen at work, they typically tend to be teams doing something new / greenfield-ish or a refactor. So I'm not surprised by their results.

This is wild. I’m on the other end.

I’ve probably prompted 10,000 lines of working code in the last two months. I started with terraform which I know backwards and forwards. Works perfectly 95% of the time and I know where it will go wrong so I watch for that. (Working both green field, in other existing repos and with other collaborators)

Moved on to a big data processing project, works great, needed a senior engineer to diagnose one small index problem which he identified in 30s. (But I’d bonked on for a week because in some cases I just don’t know what I don’t know)

Meanwhile a colleague wanted a sample of the data. Vibe coded that. (Extract from zip without decompressing) He wanted randomized. One shot. Done. Then he wanted randomized across 5 categories. Then he wanted 10x the sample size. Data request completed before the conversion was over. I would have worked on that for three hours before and bonked if I hit the limit of my technical knowledge.

Built a monitoring stack. Configured servers, used it to troubleshoot dozens of problems.

For stuff I can’t do, now I can do. For stuff I could do with difficulty now I can do with ease. For stuff I could do easily now I can do fast and easy.

Your vastly different experience is baffling and alien to me. (So thank you for opening my eyes)

  • I don’t find it baffling at all and both your experiences perfectly match mine.

    Asking AI to solve a problem for you is hugely non-linear. Sometimes I win the AI lottery and its output is a reasonable representation of what I want. But mostly I loose the AI lottery and I get something that is hopeless. Now I have a conundrum.

    Do I continue to futz with the prompt and hope if I wiggle the input then maybe I get a better output, or have I hit a limit and AI will never solve this problem? And because of the non linear nature I just never know. So these days I basically throw one dart. If it hits, great. If I miss I give up and do it the old fashioned way.

    My work is in c++ primarily on what is basically fancy algorithms on graphs. If it matters.

  • What I've found Claude really helpful for is filling in the gaps. When you know vaguely how do to something like interpret data, but what other packages exist in xyz random technical domain? That is how I found for expample https://cran.r-project.org/web/packages/gggenes/vignettes/in... and Orthofinder when trying to teaching myself computational biology.

    But sometimes even Claude gets stuck e.g. when I was trying to set up micropython via platformio running inside wsl2 on a windows 11 it got stuck setting up my ESP32 board.

Also at FAANG. I think I am using the tools more than my peers based on my conversations. The first few times I tried our AI tooling, it was extremely hit and miss. But right around December the tooling improved a lot, and is a lot more effective. I am able to make prototypes very quickly. They are seldom check-in ready, but I can validate assumptions and ideas. I also had a very positive experience where the LLM pointed out a key flaw in an API I had been designing, and I was able to adjust it before going further into the process.

Once the plan is set, using the agentic coder to create smaller CLs has been the best avenue for me. You don't want to generate code faster than you and your reviewers can comprehend it. It'll feel slow, but check ins actually move faster.

I will say it's not all magic and success. I have had the AI lead me down some dark corners, assuring me one design would work when actually it is a bit outdated or not quite the right fit for the system we are building for because of reasons. So, I wouldn't really say that it's a 10x multiplier or anything, but I'm definitely getting things done faster than I could on my own. Expertise on the part of the user is still crucial.

One classic issue I used to run into, is doing a small refactor and then having to manually fix a bunch of tests. It is so much simpler to ask the LLM to move X from A to B and fix any test failures. Then I circle back in a few minutes to review what was done and fix any issues.

The other thing is, it has visibility for the wider code base, including some of our infrastructure that we're dependent on. There have been a couple times in the past quarter where our build is busted by an external team, and I am able to ask the LLM given the timeframe and a description of the issue, the exact external failure that caused it. I don't really know how long it would have taken to resolve the issue otherwise, since the issues were missed by their testing. That said, I gotta wonder if those breakages were introduced by LLM use.

My job hasn't been this fun in a long, long time and I am a little uneasy about what these tools are going to mean for my personal job security, but I don't know how we can put the genie back into the bottle at this point.

I can second this. I’ve never had a problem writing short scripts and glue code in stuff ive mastered. In places I actually need help, I’m finding it slows me down.

Wow, that's such a drastic different experience than mine. May I ask what toolset are you using? Are you limited to using your home grown "AcmeCode" or have full access to Claude Code / Cursor with the latest and greatest models, 1M context size, full repo access?

I see it generating between 50% to 90% accuracy in both small and large tasks, as in the PRs it generates range between being 50% usable code that a human can tweak, to 90% solution (with the occasional 100% wow, it actually did it, no comments, let's merge)

I also found it to be a skillset, some engineers seem to find it easier to articulate what they want and some have it easier to think while writing code.

  • I used to think that the people who keep saying (in March 2026) that AI does not generate good code are just not smart and ask stupid prompts.

    I think I've amended that thought. They are not necessarily lacking in intelligence. I hypothesize that LLMs pick up on optimism and pessimism among other sentiments in the incoming prompt: someone prompting with no hope that the result will be useful end up with useless garbage output and vice versa.

    • This is kinda like that thing about how psychic mediums supposedly can't medium if there's a skeptic in the room. Goes to show that AI really is a modern-day ouija board.

      2 replies →

    • That sounds a lot more like confirmation bias than any real effect on the AI's output.

      Gung-ho AI advocates overlook problems and seem to focus more on the potential they see for the future, giving everything a nice rose tint.

      Pessimists will focus on the problems they encounter and likely not put in as much effort to get the results they want, so they likely see worse results than they might have otherwise achieved and worse than what the optimist saw.

      1 reply →

    • It's probably more to do with the intelligence required to know when a specific type of code will yield poor future coding integrations and large scale implementation.

      It's pretty clear that people think greenfield projects can constantly be slopified and that AI will always be able to dig them another logical connection, so it doesn't matter which abstraction the AI chose this time; it can always be better.

      This is akin to people who think we can just keep using oil to fuel technological growth because it'll some how improve the ability of technology to solve climate problems.

      It's akin to the techno capitalist cult of "effective altruism" that assumes there's no way you could f'up the world that you can't fix with "good deeds"

      There's a lot of hidden context in evaluating the output of LLMs, and if you're just looking at todays success, you'll come away with a much different view that if you're looking at next year's.

      Optimism is only then, in this case, that you believe the AI will keep getting more powerful that it'll always clean up todays mess.

      I call this techno magic, indistinguishable from religious 'optimism'

    • Don’t know why you’re getting downvoted, this is a fascinating hypothesis and honestly super believable. It makes way more sense than the intuitive belief that there’s actually something under the human skin suit understanding any of this code.

This checks out logical speaking.

The FANG code basis are very large and date back years might not necessarily be using open source frameworks rather in house libraries and frameworks none of which are certainly available to Anthropic or OpenAI hence these models have zero visibility into them.

Therefore combined with the fact that these are not reasoning or thinking machines rather probabilistic (image/text) generators, they can't generate what they haven't seen.

  • No it doesn't check out. I think it's becoming abundantly clear LLMs learn in real time as they speak to you. There's a lot of denial and people claiming they don't learn that their knowledge is fixed on the training data and this is not even remotely true at all.

    LLMs learn dynamically through their context window and this learning is at a rate much faster than humans and often with capabilities greater than humans and often much worse.

    For a code base as complex and as closed source as google the problems an LLM faces is largely the same as a human. How much can he fit into the context window?

    • You're observing this "paradox", because what you call learning here is not learning in the ML sense; it's deriving better conclusions from more data. It's true for many ML methods, but it doesn't mean any actual learning happens.

      1 reply →

    • It checks out if you take into account most developers are actually rather mediocre outside of places where they spend an insane amount of time and money to get good devs (including but not limited to FANG)

  • That's why coding agents usually scans various files to figure out how to work in a particular codebase. I work with very large and old project, and Codex most of time manages to work with our frameworks.

  • Huh? I have over a hundred services/repos checked out locally, ranging from 10+ years old to new. I have no problem leveraging AI to work in this large distributed codebase.

    Even internal stuff is usable by the model because it’s a pattern matching machine and there should be documentation available, or it can just study the code like a human.

Not a FAANG engineer but also working at a pretty large company and I want to say you're spot on 1000%. It's insane how many "commenters" come out of the woodwork to tell you you're doing x or y wrong. They may not even frame it that way, but use a veneer of questions "what is your process like? Have you tried this product, etc." as a subtle way of completely dismissing your shared experience.

Same here. My take is that the codebase is too large and complex for it to find the right patterns.

It does work sometimes. The smaller the task, the better.

  • Isn’t that fixed by having it create a plan, then you review it and say “x should do y instead”, it updates the plan, iterate then “build the plan”?

Can you elaborate on the shortcomings you find in professional setting that aren't coming up on personal projects? With it handling greenfield tasks are you perhaps referring to the usual sort of boilerplate code/file structure setup that is step 0 with using a lot of libraries?

>I am yet to successfully prompt it and get a working commit.

May I ask what you're working on?

Experience depends on which FAANG it is. Amazon for example doesn't allow Claude Code or Codex so you are stuck with whatever internal tool they have

Meta, despite competing with these, is open to let their devs use better off the shelf tools.

  • I work at aws and generally use Claude Opus 4.6 1M with Kiro (aws’s public competitor to Claude Code). My experience is positive. Kiro writes most of my code. My complaints:

    1. Degraded quality over longer context window usage. I have to think about managing context and agents instead of focusing solely on the task.

    2. It’s slow (when it’s “thinking”). Especially when it’s tasked with something simple (e.g., I could ask Claude Opus to commit code and submit for review but it’s just faster if I run the commands myself and I don’t want to have to think about conditionally switching to Haiku / faster models mid task execution).

    3. It often requires a lot of upfront planning and feedback loop set up to the extent that sometimes I wonder if it would’ve been faster if I did it myself.

    A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.

    • > A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.

      I was thinking about this recently. This kind of setup is a Holy Grail everyone is searching for. Make the damn tool produce the right output more of the time. And yet, despite testing the methods provided by the people who claim they get excellent results, I still come to the point where the it gets off rails. Nevertheless, since practically everybody works on resolving this particular issue, and huge amounts of money have been poured into getting it right, I hope in the next year or so we will finally have something we can reliably use.

  • Meta is doing something healthy - signalling that it is behind with its LLM efforts. Nothing wrong with that.

Could you say more on how the tasks where it works vs. doesn't work differ? Just the fact that it's both small and greenfield in the one case and presumably neither in the other?

Can you provide an example of how you actually prompt AI models? I get the feeling the difference among everyone's experiences has to do with prompting and expectation.

  • [dead]

    • I find that the default Claude Code harness deals with the ambiguity best right now with the questionnaire system. So you can pose the core of the problem first and then specify only those implementation details that matter.

    • I wasn't implying that clever prompting needed to be used. I'm just trying to confirm that the person I was replying to isn't just saying what essentially amounts to "build me X".

      When I write my prompts, I literally write an essay. I lay constraints, design choices, examples, etc. If I already have a ticket that lays out the introduction, design considerations, acceptance criteria and other important information, then I'll include that as well. I then take the prompt I've written and I request for the model to improve the prompt. I'll also try to include the most important bits at the end since right now models seem to focus more on things referenced at the end of a prompt rather than at the beginning.

      Once I do get output, I then review each piece of generated code as if I'm doing an in-depth code review.

      1 reply →