How AI Impacts Skill Formation

7 hours ago (arxiv.org)

Among the six patterns identified, it's interesting that "Iterative AI Debugging" takes more time (and possibly tokens) but results in worse scores than letting AI do everything. So this part really should be handed over to agent loops.

The three high score patterns are interesting as well. "Conceptual Inquiry" actually results in less time and doesn't improve the score than the other two, which is quite surprising to me.

One of the nice things about the "dumber" models (like GPT-4) was that it was good enough to get you really far, but never enough to complete the loop. It gave you maybe 90%. 20% of which you had to retrace -- so you had to do 30% of the tough work yourself, which meant manually learning things from scratch.

The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.

I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.

  • The big issue I see coming is that leadership will care less and less about people, and more about shipping features faster and faster. In other words, those that are still learning their craft are fucked up.

    The amount of context switching in my day-to-day work has become insane. There's this culture of “everyone should be able to do everything” (within reason, sure), but in practice it means a data scientist is expected to touch infra code if needed.

    Underneath it all is an unspoken assumption that people will just lean on LLMs to make this work.

    • I think this is sadly going to be the case.

      I also used to get great pleasure from the banging head and then the sudden revelation.

      But that takes time. I was valuable when there was no other option. Now? Why would someone wait when an answer is just a prompt away.

  • You still have the system design skills, and so far, LLMs are not that good in this field.

    They can give plausible architecture but most of the time it’s not usable if you’re starting from scratch.

    When you design the system, you’re an architect not a coder, so I see no difference between handing the design to agents or other developers, you’ve done the heavy lifting.

    In that perspective, I find LLMs quite useful for learning. But instead of coding, I find myself in long sessions back and forth to ask questions, requesting examples, sequence diagrams .. etc to visualise the final product.

    • I see this argument all the time, and while it sounds great on paper (you're an architect now, not a developer) people forget (or omit?) that a product needs far fewer architects than developers, meaning the workforce gets in fact trimmed down thanks to AI advancements.

      1 reply →

  • This is what I am thinking about this morning. I just woke up, made a cup of coffee, read the financial news, and started exploring the code I wrote yesterday.

    My first thought was that I can abstract what I wrote yesterday, which was a variation of what I built over the previous week. My second thought was a physiological response of fear that today is going to be a hard hyper focus day full of frustration, and that the coding agents that built this will not be able to build a modular, clean abstraction. That was followed by weighing whether it is better to have multiple one off solutions, or to manually create the abstraction myself.

    I agree with you 100 percent that the poor performance of models like GPT 4 introduced some kind of regularization in the human in loop coding process.

    Nonetheless, we live in a world of competition, and the people who develop techniques that give them an edge will succeed. There is a video about the evolution of technique in the high jump, the Western Roll, the Straddle Technique, and finally the Fosbury Flop. Using coding agents will be like this too.

    I am working with 150 GB of time series data. There are certain pain points that need to be mitigated. For example, a different LLM model has to be coerced into analyzing or working with the data from a completely different approach in order to validate. That means instead of being 4x faster, each iteration is 4x faster, and it needs to be done twice, so it still is only 2x faster. I burned $400 in tokens in January. This cannot be good for the environment.

    Timezone handling always has to be validated manually. Every exploration of the data is a train and test split. Here is the thing that hurts the most. The AI coding agents always show the top test results, not the test results of the top train results. Rather than tell me a model has no significant results, it will hide that and only present the winning outliers, which is misleading and, like the OP research suggests, very dangerous.

    A lot of people are going to get burned before the techniques to mitigate this are developed.

    Overfitting has always been a problem when working with data. Just because the barrier of entry for time series work is much lower does not mean that people developing the skill, whether using old school tools like ARIMA manually or having AI do the work, escape the problem of overfitting. The models will always show the happy, successful looking results.

    Just like calculators are used when teaching higher math at the secondary level so basic arithmetic does not slow the process of learning math skills, AI will be used in teaching too. What we are doing is confusing techniques that have not been developed yet with not being able to acquire skills. I wrack and challenge my brain every day solving these problems. As millions of other software engineers do as well, the patterns will emerge and later become the skills taught in schools.

  • Idk i very much feel like Claude Code only ever gets me really far, but never there. I do use it a fair bit, but i still write a lot myself, and almost never use its output unedited.

    For hobby projects though, it's awesome. It just really struggles to do things right in the big codebase at work.

  • > The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.

    And then you find out someone else had already solved it. So might as well use the Google 2.0 aka ChatGPT.

    • Well, this is exactly the problem. This tactic works until you get to a problem that nobody has solved before, even if it's just a relatively minor one that no one has solved because no one has tried to because it's so specific. If you haven't built up the skills and knowledge to solve problems, then you're stuck.

    • But to understand the solution from someone else, you would have to apply your mind to understand the problem yourself. Transferring the hard work of thinking to GPT will rob you of the attention you will need to understand the subject matter fully. You will be missing insights that would be applicable to your problem. This is the biggest danger of brain rot.

The title of this submission is misleading, that's not what they're saying. They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.

  • The study measures if participants learn the library, but what they should study is if they learn effective coding agent patterns to use the library well. Learning the library is not going to be what we need in the future.

    > "We collect self-reported familiarity with AI coding tools, but we do not actually measure differences in prompting techniques."

    Many people drive cars without being able to explain how cars work. Or use devices like that. Or interact with people who's thinking they can't explain. Society works like that, it is functional, does not work by full understanding. We need to develop the functional part not the full understanding part. We can write C without knowing the machine code.

    You can often recognize a wrong note without being able to play the piece, spot a logical fallacy without being able to construct the valid argument yourself, catch a translation error with much less fluency than producing the translation would require. We need discriminative competence, not generative.

    For years I maintained a library for formatting dates and numbers (prices, ints, ids, phones), it was a pile of regex but I maintained hundreds of test cases for each type of parsing. And as new edge cases appeared, I added them to my tests, and iterated to keep the score high. I don't fully understand my own library, it emerged by scar accumulation. I mean, yes I can explain any line, but why these regexes in this order is a data dependent explanation I don't have anymore, all my edits run in loop with tests and my PRs are sent only when the score is good.

    Correctness was never grounded in understanding the implementation. Correctness was grounded in the test suite.

    • > Many people drive cars without being able to explain how cars work.

      But the fundamentals all cars behave the same way all the time. Imagine running a courier company where sometimes the vehicles take a random left turn.

      > Or interact with people who's thinking they can't explain

      Sure but they trust those service providers because they are reliable . And the reason that they are reliable is that the service providers can explain their own thinking to themselves. Otherwise their business would be chaos and nobody would trust them.

      How you approached your library was practical given the use case. But can you imagine writing a compiler like this? Or writing an industrial automation system? Not only would it be unreliable but it would be extremely slow. It's much faster to deal with something that has a consistent model that attempts to distill the essence of the problem, rather than patching on hack by hack in response to failed test after failed test.

    • You can, most certainly, drive a car without understanding how it works. A pilot of an aircraft on the other hand needs a fairly detailed understanding of the subsystems in order to effectively fly it.

      I think being a programmer is closer to being an aircraft pilot than a car driver.

      2 replies →

    • Interesting argument.

      But isn't the corrections of those errors that are valuable to society and get us a job?

      People can tell they found a bug or give a description about what they want from a software, yet it requires skills to fix the bugs and to build software. Though LLMs can speedup the process, expert human judgment is still required.

      5 replies →

  • I agree. It's very missleading. Here's what the authors actually say:

    > AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

    • That itself sounds contradictory to me.

      I assistance produces significant productivity gains across professional domains, particularly for novice workers.

      We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

      Are the two sentences talking about non-overlapping domains? Is there an important distinction between productivity and efficiency gains? Does one focus on novice users and one on experienced ones? Admittedly did not read the paper yet, might be clearer than the abstract.

      6 replies →

  • I agree the title should be changed, but as I commented on the dupe of this submission learning is not something that happens as a beginner, student or "junior" programmer and then stops. The job is learning, and after 25 years of doing it I learn more per day than ever.

  • > They said it doesn't show productivity gains for inexperienced developers still gaining knowledge.

    But that's what "impairs learning" means.

Key snippet from the abstract:

> Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

The library in question was Python trio and the model they used was GPT-4o.

When I use AI to write code, after a week or 2, if I go back to the written code I have a hard time catching up. When I write code by myself I always just look at it and I understand what I did.

  • a program is function of the programmer, how you code is how you think. that is why it is really difficult, even after 60 years, for multiple people to work on the same codebase, over the years we have made all kinds of rules and processess so that code written by one person can be understood and changed by another.

    you can also read human code and empathise what were they thinking while writing it

    AI code is not for humans, it is just a stream of tokens that do something, you need to build skills to empirically verify that it does what you think it does, but it is pointless to "reason" about it.

Edit: Changed title

Previous title: "Anthropic: AI Coding shows no productivity gains; impairs skill development"

The previous title oversimplified the claim to "all" developers. I found the previous title meaningful while submitting this post because most of the false AI claims of "software engineer is finished" has mostly affected junior `inexperienced` engineers. But I think `junior inexperienced` was implicit which many people didn't pick.

The paper makes a more nuanced claim that AI Coding speeds up work for inexperienced developers, leading to some productivity gains at the cost of actual skill development.

Many say generative AI is like a vending machine. But if your vending machine has not 1 button but a keyboard, and you type anything you want in, and it makes it (Star Trek Replicator) and you use it 10,000 times to refine your recipes, did you learn something or not? How about a 3D printer, do you learn something making designs and printing them?

  • Instead of "vending machine", I see many people calling generative AI "slot machine", which more aptly describes current genAI tools.

    Yes, we can use it 10,000 times to refine our recipes, but "did we learn from it"? I am doubtful about that, given that even after running with the same prompt 10 times, it will give different answers in 8/10 responses.

    But I am very confident that I can learn by iterating and printing designs on a 3D printer.

  • Star Trek replicators were deterministic. They had a library of things they could replicate that your programmed in and that’s the extent of what they could do. They replicated to the molecular level, no matter how many times you ask for something, you got the exact same thing. You’d never ask for a raktajino and get a raw steak. In the rare instances where they misbehaved as a plot point, they were treated as being broken and needing fixing, no one ever suggested “try changing your prompt, or ask it seventeen times until you get what you want”.

  • 3d printer: you learn something of you make CAD designs yourself and print them yes. It is a skill.

Often when I use it I know that there is a way to do something and I know that I could figure it out by going through some api documents and maybe finding some examples on the web... IOW I already have something in mind.

For example I wanted to add a rate-limiter to an api call with proper http codes, etc. I asked the ai (in IntelliJ it used to be Claude by default but they've since switched to Gemini as default) to generate one for me. The first version was not good so I asked it to do it again but with some changes.

What would take me a couple of hours or more took less than 10 minutes.

  • Exactly this.

    I’m starting to believe that people who think AI-generated code is garbage actually don’t know how to code.

    I hit about 10 years of coding experience right before AI hit the scene, which I guess makes me lucky. I know, with high confidence, what I want my code to look like, and I make the AI do it. And it does it damn well and damn fast.

    I think I sit at a unique point for leveraging AI best. Too junior and you create “working monsters.” Meanwhile, Engineering Managers and Directors treat it like humans, but it’s not AGI yet.

I don't understand how so many people can be OK with inflicting brain rot on themselves and basically engineering themselves out of a career.

I use a web ui to chat with ai and do research, and even then I sometimes have to give up and accept that it won't provide the best solution that I know exists and am just to lazy to flesh out on my own. And to the official docs I go.

But the coding tools, I'm sorry but they constantly disappoint me. Especially the agents. In fact the agents fucking scare me. Thank god copilot prompts me before running a terminal command. The other day I asked it about a cypress test function and the agent asked if it could run some completely unrelated gibberish python code in my terminal. That's just one of many weird things it's done.

My colleagues vibe code things because they don't have experience in the tech we use on our project, it gets passed to me to review with "I hope you understand this". Our manager doesn't care because he's all in on AI and just wants the project to meet deadlines because he's scared for his job, and each level up the org chart from him it's the same. If this is what software development is now then I need to find another career because its pathetic, boring, and stressful for anyone with integrity.

I’ve been making the case (e.g. https://youtu.be/uL8LiUu9M64?si=-XBHFMrz99VZsaAa [1]) that we have to be intentional about using AI to augment our skills, rather than outsourcing understanding: great to see Anthropic confirming that.

[1] plug: this is a video about the Patreon community I founded to do exactly that. Just want to make sure you’re aware that’s the pitch before you do ahead and watch.

They lost me in the abstract when said “AI increase productivity especially with novice workers” From my experience, it was the most experienced and fluent in the engineering world who gained the most value from AI.

I've noticed this as well. I delegate to agentic coders on tasks I need to have done efficiently, which I could do myself and lack time to do. Or on tasks which are in areas I simply don't care much for, for languages which I don't like very much etc

Can we ban Anthropic research papers to be submitted on HN?

This study is so bad, the sample size is n = 52 and then in some conclusions it goes down to n = 2.

  • This seems to be a totally normal sample size for such kinds of studies where you look at quantitative and qualitative aspects. Is this the only reason why you find the study to be bad?

  • It is sad to see how far Anthropic and OpenAI have strayed from their research roots, that a pitiful manuscript like this can pass muster.

Nice to see an AI coding company allow such studies to come out, and it looks decently designed

  • Dont give them kudos, they are just trying to seem like a "research" company while submitting bogus papers on arXiv (not peer-reviewed)

This is a fancy way of saying that if you invent the calculator, people get worse at sums. I'm not an AI doomer or a boomer - but it's clear to me that some skills will be permanently relegated to AI.

  • Yes, except the calculator is right 100% of the time. LLMs are right ??% of the time, where ?? constantly changes, changes with prompts, etc.