← Back to context

Comment by ipython

1 year ago

I really wanted to believe this could be simply fixed by more prompt engineering and chain of thought, so I tried it with Claude 3.5 and … well unfortunately still no luck. the reply is amusing - I’ll include it verbatim below.

Of note, this happened because earlier I had asked it without CoT and it incorrectly stated two “r”s. Once I started a new conversation (so fresh context window), it gave the right answer (3) and no longer claimed the “still the second r” as shown below.

Let's think through this step-by-step:

1. First, let's spell out "strawberry" letter by letter: s-t-r-a-w-b-e-r-r-y

2. Now, let's count the "r"s as we go through each letter: s (0) t (0) r (1) - First "r" found a (1) w (1) b (1) e (1) r (2) - Second "r" found r (2) - This is still the second "r" y (2)

3. After going through all the letters, we've counted 2 "r"s.

Therefore, the word "strawberry" contains 2 "r"s.

Works for me with Claude 3.5:

Me: How many Rs are in the word strawberry? Think step by step

Claude: Let's approach this step-by-step:

1. First, let's write out the word: strawberry

2. Now, let's go through each letter one by one: s - not an R t - not an R r - this is an R (count: 1) a - not an R w - not an R b - not an R e - not an R r - this is another R (count: 2) r - this is a third R (count: 3) y - not an R

3. After going through all the letters, we've counted 3 Rs.

Therefore, there are 3 Rs in the word "strawberry".

  • Yeah, I figured out that it didn’t work at first because I still had the previous attempt (where it got it wrong) in my conversation history. Starting with a fresh conversation gave me the correct answer. It was still funny to see it “rationalize” the wrong answer tho.

For these classes of problems that LLMs struggle with, a more reliable way to go about them seems to be to ask them to solve them using tools, e.g. writing and executing a Python script to count the "R"s.