Comment by izucken

5 months ago

So it's much worse than I assumed from paper and repo overview?

For further clarification: 1. See the issue example #14268 https://github.com/openai/SWELancer-Benchmark/tree/08b5d3dff.... It has a patch that is supposed to "reintroduce" the bug into the codebase (note the comments):

  +    // Intentionally use raw character count instead of HTML-converted length
  +    const validateCommentLength = (text: string) => {
  +        // This will only check raw character count, not HTML-converted length
  +        return text.length <= CONST.MAX_COMMENT_LENGTH;
  +    };

Also, the patch is supposedly applied over commit da2e6688c3f16e8db76d2bcf4b098be5990e8968 - much later than original fix, but also a year ago, not sure why, might be something to do with cut off dates.

2. Proceed to https://github.com/Expensify/App/issues/14268 to see the actual original issue thread.

3. Here is the actual merged solution at the time: https://github.com/Expensify/App/pull/15501/files#diff-63222... - as you can see the diff is quite different... Not only that, but the point to which the "bug" was reapplied is so far to the future that repo migrated to typescript even.

---

And they still had to add a whole another level of bullshit with "management" tasks on top of that, guess why =)

Prior "bench" analysis for reference: https://arxiv.org/html/2410.06992v1

(edit: code formatting)