← Back to context

Comment by riku_iki

1 year ago

> If f(paper) -> code is the weakest part of the chain, it makes sense to target that.

my point is that LLMs are already potentially seeing solution on github, so you can't use that benchmark as metric unless there is some explanation.

4 comments

riku_iki

Reply

kelseyfrog 1 year ago

How does that work with knowledge cutoff?

riku_iki 1 year ago
It could work with knowledge cut off if they can reliably guarantee it, and also make sure LLMs are not searching github under the surface.
- kelseyfrog 1 year ago
  
  What's the likelihood that the researchers have done this? It seems fairly easy.
  
  1 reply →