← Back to context

Comment by kristianp

1 year ago

I tried a problem I was looking at recently, to refactor a small rust crate to use one datatype instead of an enum, to help me understand the code better. I found o1-mini made a decent attempt, but couldn't provide error free code. o1-preview was able to provide code that compiled and passed all but the test that is expected to fail, given the change I asked it to make.

This is the prompt I gave:

simplify this rust library by removing the different sized enums and only using the U8 size. For example MasksByByte is an enum, change it to be an alias for the U8 datatype. Also the u256 datatype isn't required, we only want U8, so remove all references to U256 as well.

The original crate is trie-hard [1][2] and I forked it and put the models attempts in the fork [3]. I also quickly wrote it up at [4]

[1] https://blog.cloudflare.com/pingora-saving-compute-1-percent...

[2] https://github.com/cloudflare/trie-hard

[3] https://github.com/kpm/trie-hard-simple/tree/main/attempts

[4] https://blog.reyem.dev/post/refactoring_rust_with_chatgpt-o1...

I've been having a weird timezone issue in my Rails application that I've had a hard time getting my head around. I tried giving o1-preview the relevant code and context it needed to know and it gave answers that seemed to make sense but it still wasn't able to resolve the bug and explain exactly what was going on.

So, it seems like anything that requires some actual thought and problem-solving is tough for it to answer.

I'm sure it's just a matter of time before devs are out of work but it seems like we'll be safe for another few years anyway.

  • I'm still not convinced that it's not going through approximate reasoning chain retrieval and that's self-triggered to get more reasoning chains that will maximize it's goal. I'm seeing a lot of comments from other SWEs using it for non-trivial tasks in which it fails at but is just trying harder to look like it's problem solving. Even with more context and documentation, it fails to realize details an experienced SWE would pick up quickly.

I cannot tell from reading what you wrote whether you think it did a good job or not

  • Thanks for the feedback. I do think it did a good job in the end. I haven't had time to have a good look at the final code o1-preview produced and also my understanding of rust is pretty basic, which I why I didn't say more about the results. I think rust is one of those languages where, if it compiles, you're most of the way there, because of the strong type system. Not as strong as Haskell or Ocaml perhaps.