← Back to context

Comment by ianbutler

5 months ago

3.5 Sonnet Yes IC SWE (Diamond) N/A 26.2% $58k / $236k 24.5%

But sonnet solved over 25% of them and made 60 grand.

That's a substantial amount of work. I don't entirely disagree with you about it being premature but these things are clearly providing substantial value.

>But sonnet solved over 25% of them and made 60 grand.

Technically it didn’t since all these tasks were done some time ago. On that note, I feel like putting a dollar amount on the tasks it was able to complete is misleading.

In the real world, if a model masquerading as a human is only right 25% of the time, its reviews on Upwork would reflect that and it would never be able to find work ever again. It might make a couple thousand before it loses trust.

Of course things would be different if they were open and upfront about this being an LLM, in which case it would presumably never run out of trust.

And again, Expensify is an anomaly among companies in that it gives freelancers well articulated tasks to work on. The real world is much more messy.

  • That's a lot of qualifying you have to do to discount this which that's fine but my take is you do that at your own peril as we look to the future of this tech.

    The real world is messy but the real world also adapts to the most cost effective solution even if it's just alright.

    People will spend more time specifying their task for an LLM based tool if it gets the job done and costs a fraction of a freelancer.