Comment by steego

6 days ago

Approximately speaking, what do you want to see put up?

I ask this because it reads like you have a specific challenge in mind when it comes to generative AI and it sounds like anything short of "proof of the unlimited powers" will fall short of being deemed "useful".

Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea. They're not even suggesting it should be used without supervision. It's useful when used by people who understand its limitations and leverage its strengths.

If you want to see how it's been used by someone who was happy with the results, and is willing to share their results, you can scroll down a few stories on the front-page and check the commit history of this project:

https://github.com/cloudflare/workers-oauth-provider/commits...

Now here's the deal: These people aren't trying to prove anything to you. They're just sharing the results of an experiment where a very talented developer used these tools to build something useful.

So let me ask you this: Can we at least agree that these tools can be of some use to talented developers?

Yes sure I’ve checked in code generated by AI myself. I’ve not experienced the excitement this article exudes though and it seems very limited in usefulness due to the by now well-documented downsides. Frankly I haven’t bothered using it much recently, it’s just not there yet IME and I’m not sure LLMs ever will be.

What I’m interested in really is just case studies with prompts and code - that’s a lot more interesting for hackers IMO than hype.

It's useful, but the promise of every AI company is very explicitly that they will burn the seed corn and choke off the pipeline that created those "very talented" developers who reviewed it!

  • I’m less worried about this as the best way to learn to code is to read as well as write it IMO.

    If capabilities don’t improve it’s not replacing anyone, if they do improve and it can write good code, people can learn from reading that.

    I don’t see a pathway to improvement though given how these models work.

> Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea

This article and vocal supporters are not being reasonable at all, they make a not so between-the-lines separation between skeptics (which are nuts) and supporters ("My smartest friends are blowing it off." in a smug "I'm smarter than my smarter friends").

I mean, come on.

  • You are absolutely correct. This article and vocal supporters are often not reasonable and I should have made that point.

    I honestly found the article to be an insufferably glib and swaggering piece that was written to maximize engagement rather than to engage the subject seriously.

    The author clearly values maximizing perceived value with the least amount of effort.

    Frankly, I’m tired of reading articles by people who can’t be bothered to present the arguments of the people they’re disagreeing with honestly and I just gave up halfway reading it because it was so grating.

> Reasonable people aren't claiming this stuff is a silver bullet or a panacea.

Are you saying the CEO of Anthropic isn't reasonable? or Klarna?

  • The CEO of Anthropic is the least reasonable person in this discussion.

    Surely you can see how insanely biased all of their statements would be. They are literally selling the shovels in this gold rush.

    Anything they say will be in service of promoting AI, even the bad/cautionary stuff because they know there's an audience who will take it the other way (or will choose to jump in to not be left behind), and also news is news, it keeps people talking about AI.

    • For the record, as the author of this piece, I do not think anybody should factor what the CEO of Anthropic thinks into their decisions. There is in fact a section on this argument in the post. It's short, so it's easy to miss.

  • Of course not. CEOs are Chief Narrative Officers, one of their main functions is to craft and push a message (which is different than collating and reporting facts). Reason doesn’t not factor in.

I think that experiment was very cool, but I will say that the OAuth2.0/OIDC protocol is very well documented and there are tons of tools already built around it in multiple languages.

I implemented the OAuth2.0 protocol in 3 different languages without a 3rd party library - entire spec implemented by hand. This was like ~2015 when many of the libraries that exist today didn't back then. I did this as a junior developer for multiple enterprise applications. At the end of the day it's not really that impressive.

  • Three weeks ago I did basically the same thing as the author of the Cloudflare story, but I did it with my own open source tool. I went into the experiment treating Claude Code as a junior engineer and guiding it on a feature I wanted implemented.

    In a single Saturday the LLM delivered the feature to my spec, passing my initial test cases, adding more tests, etc…

    I went to bed that night feeling viscerally in my bones I was pairing with and guiding a senior engineer not a junior. The feature was delivered in one day and would have taken me a week to do myself.

    I think stories like the Cloudflare story are happening all over right now. Staff level engineers are testing hypotheses and being surprised at the results.

    Oauth 2.0 doesn’t really matter. If you can guide the model and clearly express requirements, boundaries, and context, then it’s likely to be very useful and valuable in its current form.

  • This is a great example of how no example provided is ever good enough. There’s always an argument that it doesn’t really count. Yet you just said the computer is doing what you did as a junior developer.

  • It's not supposed to be impressive. It's a faster way to do the unimpressive stuff. Which is the bulk of real-world software work at most companies.

    Maybe you just have that dream job where you only have to think hard thoughts. But that's just not the norm, even at a bleeding edge startup.

  • Exactly how long did it take you? And now how much actual time was spent in the comparison prompting and code review by Cloudflare?