Comment by realitysballs
5 months ago
I believe the outcome of this type of article is actually positive. The ‘SWE-Lancer’ benchmark provides visibility into a more pragmatic assessment of LLM capabilities.
Ironically it actually refutes Altman’s claims mentioned in the same article . Hard to replace engineers when you create a benchmark you can’t score decently on.
Or it could be a case of: Never prepare a benchmark/prep a comparison which you think you won't succeed at. This is especially true when you are funded by mostly private/VC investors. Time will tell.
I think they are trying to frame the narrative; then succeed at it. Let's see. This helps justify OpenAPI's validation and efforts to investors/VC's. After all; IMO without coding as a use case for LLM's AI wouldn't nearly have the same hype/buzz as it does now. Greed (profit) and fear (losing jobs) are a great motivator to keep investment hype and funds coming in.