Comment by kragen

7 days ago

I've found this to be one of the most useful ways to use (at least) GPT-4 for programming. Instead of telling it how an API works, I make it guess, maybe starting with some example code to which a feature needs to be added. Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.

These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)

There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.

Unfortunately, this only works with APIs that aren't already super popular.

> Sometimes it comes up with a better approach than I had thought of.

IMO this has always been the killer use case for AI—from Google Maps to Grammarly.

I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.

But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.

The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

  • > The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results

    Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.

    The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?

    • The problem here is societal, not technological. An end state where people do less work than they do today but society is more productive is desirable, and we shouldn't be trying to force companies/governments/etc to employ people to do an unnecessary job.

      The problem is that people who are laid off often experience significant life disruption. And people who work in a field that is largely or entirely replaced by technology often experience permanent disruption.

      However, there's no reason it has to be this way - the fact people having their jobs replace by technology are completely screwed over is a result of the society we have all created together, it's not a rule of nature.

      18 replies →

    • > Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.

      > But then along comes a greedy business manager who reckons recklessly

      Thanks for this. :)

      1 reply →

    • I think you’re describing the principle/agent problem that people have wrestled with forever. Oppenheimer comes to mind.

      You make something, but because you don’t own it—others caused and directed the effort—you don’t control it. But the people who control things can’t make things.

      Should only the people who can make things decide how they are used though? I think that’s also folly. What about the rest of society affected by those things?

      It’s ultimately a societal decision-making problem: who has power, and why, and how does the use of power affect who has power (accountability).

      1 reply →

    • > The big problem is that developers/researchers lose control

      if these developers/researchers are being paid by someone else, why should that same someone else be giving up the control that they paid for?

      If these developers/researchers are paying the research themselves (e.g., a startup of their own founding), then why would they ever lose control, unless they sell it?

      1 reply →

    • The problem of those greedy business managers you speak of is that, they don't care how the company does 10 year down the line and I almost feel as if everybody is just doing things which work short term ignoring the long term consequences.

      As the comment above said that we need a human in the loop for better results, Well firstly it also depends on human to human.

      A senior can be way more productive in the loop than a junior.

      So Everybody has just stopped hiring juniors because they cost money and they will deal with the AI almost-slop later/ someone else will deal with it.

      Now the current seniors will one day retire but we won't have a new generation of seniors because nobody is giving juniors a chance or that's what I've heard about the job market being brutal.

    • You're trying to put out a forest fire with an eyedropper.

      Stock your underground bunkers with enough food and water for the rest of your life and work hard to persuade the AI that you're not a threat. If possible, upload your consciousness to a starwisp and accelerate it out of the Solar System as close to lightspeed as you can possibly get it.

      Those measures might work. (Or they might be impossible, or insufficient.) Changing your license won't.

      6 replies →

  • I will never use grammarly, not matter how good they get. They've interrupted too many videos for me to let it pass.

  • > The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

    That's how you get economics of scale.

    Google couldn't have a human in the loop to review every page of search results before handing them out in response to queries.

    • Sure they could. We just want it to be otherwise.

      What benefit might human review have? Maybe they could make sure the SERP list entries actually have the keywords you're looking for. Even better, they could make sure the prices in the shopping section are correct! Maybe even make sure they relate to the product you actually searched for... I might actually pay money for that.

    • In the case of a search engine, the human in the loop is the user selecting which result to click.

    • Only some things scale like that. Google's insistence to use the same model everywhere has gained them a deserved reputation as having atrocious support.

  • Yes, we have the context - our unique lived experience, and are ultimately accountable for our actions. LLMs have no skin. They have no desires, and cannot be punished in any way. No matter how smart they get, we are providing their opportunities to generate value, guidance and iteration, and in the end have to live with the outcomes.

  • And that’s how everything gets flattened to same style/voice/etc.

    That’s like getting rid of all languages and accents and switch to the same language

  • What's wrong with passive?

    • Passive voice often adds length, impedes flow, and subtracts the useful info of who is doing something.

      Examples:

      * Active - concise, complete info: The manager approved the proposal.

      * Passive - wordy, awkward: The proposal was approved by the manager.

      * Passive - missing info: The proposal was approved. [by who?]

      Most experienced writers will use active unless they have a specific reason not to, e.g., to emphasize another element of the sentence, as the third bullet's sentence emphasizes approval.

      -

      edited for clarity, detail

      45 replies →

    • There's nothing wrong with the passive voice.

      The problem is that many people have only a poor ability to recognize the passive voice in the first place. This results in the examples being clunky, wordy messes that are bad because they're, well, clunky and wordy, and not because they're passive--indeed, you've often got only a fifty-fifty chance of the example passive voice actually being passive in the first place.

      I'll point out that the commenter you're replying to used the passive voice, as did the one they responded to, and I suspect that such uses went unnoticed. Hell, I just rewrote the previous sentence to use the passive voice, and I wonder how many people think recognized that in the first place let alone think it worse for being so written.

      4 replies →

    • There was a time when Microsoft Word would treat the passive voice in your writing with the same level of severity as spelling errors or major grammatical mistakes. Drove me absolutely nuts in high school.

      2 replies →

    • Passive can be disastrous when used in contractual situations if the agent who should be responsible for an action isn’t identified. E.g. “X will be done”. I was once burnt by a contract that in some places left it unclear whether the customer or the contractor was responsible for particular tasks. Active voice that identifies the agent is less ambiguous

      1 reply →

    • Sometimes it's used without thinking, and often the writing is made shorter and clearer when the passive voice is removed. But not always; rewriting my previous sentence to name the agents in each case, as the active voice requires in English, would not improve it. (You could remove "made", though.)

      2 replies →

    • Here is a simple summary of the common voices/moods in technical writing:

      - Active: The user presses the Enter key.

      - Passive: The Enter key is to be pressed.

      - Imperative (aka command): Press the Enter key.

      The imperative mood is concise and doesn't dance around questions about who's doing what. The reader is expected to do it.

      1 reply →

    • Passive is too human. We need robot-styles communications, next step is to send json.

    • In addition to the points already made, passive voice is painfully boring to read. And it's literally everywhere in technical documentation, unfortunately.

      27 replies →

I used this to great success just this morning. I told the AI to write me some unit tests. It flailed and failed badly at that task. But how it failed was instructive, and uncovered a bug in the code I wanted to test.

  • In a way, AI’s failure can be its own kind of debugger. By watching where it stumbles, you sometimes spot flaws you’d have missed otherwise.

  • Haha, that's awesome! Are you going to change the interface? What was the bug?

    • It used nonsensical parameters to the API in way that I didn't realize was possible (though obvious in hindsight). The AI got confused; it didn't think the parameters were nonsensical. It also didn't quite use them in the way that triggered the error. However it was close enough for me to realize that "hey, I never though of that possibility". I needed to fix the function to return a proper error response for the nonsense.

      It also taught me to be more careful about checkpointing my work in git before letting an agent go wild on my codebase. It left a mess trying to fix its problems.

      1 reply →

I've played with a similar idea for writing technical papers. I'll give an LLM my draft and ask it to explain back to me what a section means, or otherwise quiz it about things in the draft.

I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.

  • That's fantastic! I agree that it's very similar.

    In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.

A light-weight anecdote:

Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.

When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.

  • I don't know if there's an earlier source, but I'm guessing Matlab originally popularized the `imread` name, and that OpenCV (along with its python wrapper) took it from there, same for scipy. Scikit-image then followed along, presumably.

  • As someone not familiar with these libraries, image_get or image_read seems much clearer to me than imread. I'm wondering if the convention is worse than your instinct in this case. Maybe these AI tools will push us towards conventions that aren't always the best design.

    • image_get is clearer—unless you've used Matlab, Octave, matplotlib, SciPy, OpenCV, scikit-learn, or other things that have copied Matlab's interface. In that case, using the established name is clearer.

      (Unless, on the gripping hand, your image_get function is subtly different from Matlab's imread, for example by not returning an array, in which case a different name might be better.)

      2 replies →

  • That's a perfect example! I wonder if changing it would be an improvement? If you can just replace image_get with imread in all the callers, maybe it would save your team mental effort and/or onboarding time in the future.

    • I strongly prefer `image_get/image_read` for clarity, but I would just stump in a method called `imread` which is functionally the same and hide it from the documentation.

That's not creativity.

That's closer to simply observing the mean. For an analogy, it's like waiting to pave a path until people tread the grass in a specific pattern. (Some courtyard designers used to do just that. Wait to see where people were walking first.)

Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.

  • Best way to put it. It's very hard to discuss even slightly unique concepts with GPT. It just keeps strawmanning ideas back to a common consensus without actually understanding the deep idea.

    On the bright side, a lot of work is just finding the mean solution so.

> and being unintuitive is the only one that this will fix

That's also how I'm approaching it. If all the condensed common wisdom poured into the model's parameters says that this is how my API is supposed to work to be intuitive, how on earth do I think it should work differently? There needs to be a good reason (like composability, for example). I break expectations otherwise.

In a similar vein, some of my colleagues have been feeding their scientific paper methods sections to LLMs and asking them to implement the method in code, using the LLM's degree of success/failure as a vague indicator of the clarity of the method description.

> Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

“Sometimes” being a very important qualifier to that statement.

Claude 4 naturally doesn’t write code with any kind of long term maintenance in-mind, especially if it’s trying to make things look like what the less experienced developers wrote in the same repo.

Please don’t assume just because it looks smart that it is. That will bite you hard.

Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.

  > I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code

If anyone is stuck in this situation, give me a holler. My Gmail username is the same as my HN username. I've always been the one to hunt down my coworkers' bugs, and I think I'm the only person on the planet will finds it enjoyable to find ChatGPT'S oversights and sometimes seemingly malicious intent.

I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.

In essence, a LLM is a crystallisation of a large corpus human opinion and you are using that to focus group your API as it is representative of a reasonable third party perspective?

  • Yeah, basically. For example, it's really good at generating critical HN comments. Whenever I have a design or an idea I formulate it to GPT and ask it to generate a bunch of critical HN comments. It usually points out stuff I hadn't considered, or at least prepares me to think about and answer the tough questions.

This was a big problem starting out writing MCP servers for me.

Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.

Great point. Also, it may not be the best possible API designer in the world, but it sure sounds like a good way to forecast what an _average_ developer would expect this API to look like.

> These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

This is also similar to which areas TD-Gammon excelled at in Backgammon.

Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.

This works for UX. I give it vague requirements, and it implements something i didnt ask for, but is better than i would have thought of

how do prompt it to make it guess about the API for a library? I'm confused how you would structure that in a useful way.

  • Often I've started with some example code that invokes part of the API, but not all of it. Or in C I can give it the .h file, maybe without comments.

    Sometimes I can just say, "How do I use the <made-up name> API in Python to do <task>?" Unfortunately the safeguards against hallucinations in more recent models can make this more difficult, because it's more likely to tell me it's never heard of it. You can usually coax it into suspension of disbelief, but I think the results aren't as good.

When I see comments like yours I can't help but decry how bad was the "stochastic parrots" framing. A parrot does not hallucinate a better API.

From my perspective that’s fascinatingly upside down thinking that leads to you asking to lose your job.

AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.

Right now it’s throwing spaghetti at the wall and you’re drawing around it.

  • I find it's often way better than API design than I expect. It's seen so many examples of existing APIs in its training data that it tends to have surprisingly good "judgement" when it comes to designing a new one.

    Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.

  • Maybe. So far it seems to be a lot better at creative idea generation than at writing correct code, though apparently these "agentic" modes can often get close enough after enough iteration. (I haven't tried things like Cursor yet.)

    I agree that it's also not currently capable of judging those creative ideas, so I have to do that.

    • This sort of discourse really grinds my gears. The framing of it, the conceptualization.

      It's not creative at all, any more than taking the sum of text on a topic, and throwing a dart at it. It's a mild, short step beyond a weighted random, and certainly not capable of any real creativity.

      Myriads of HN enthusiasts often chime in here "Are humans any more creative" and other blather. Well, that's a whataboutism, and doesn't detract from the fact that creative does not exist in the AI sphere.

      I agree that you have to judge its output.

      Also, sorry for hanging my comment here. Might seem over the top, but anytime I see 'creative' and 'AI', I have all sorts of dark thoughts. Dark, brooding thoughts with a sense of deep foreboding.

      8 replies →

Complete insanity, it might change constantly even before a whole new version-retrain

Insanity driven development: altering your api to accept 7 levels of "broken and different" structures so as to bend to the will of the llms

  • I think you’re missing the OP’s point. They weren’t saying that the goal is to modify their APIs just to appease an LLM. It’s that they ask LLMs to guess what the API is and use that as part of their design process.

    If you automatically assume that what the LLM spits out is what the API ought to be then I agree that that’s bad engineering. But if you’re using it to brainstorm what an intuitive interface would look like, that seems pretty reasonable.

  • Yes, that's a bonus. In fact, I've found it worthwhile to prompt it a few times to get several different guesses at how things are supposed to work. The super lazy way is to just say, "No, that's wrong," if necessary adding, "Frotzl2000 doesn't have an enqueueCallback function or even a queue."

    Of course when it suggests a bad interface you shouldn't implement it.