← Back to context

Comment by absal

2 years ago

I have some related work where we looked at how tipping (and other variations) affect predictions and accuracy in classification tasks. We experimented with ChatGPT and the different versions of Llama 2.

TLDR: We found similar results where tipping performs better in some tasks and worse in others, but it doesn't make a big difference overall. The one exception was Llama 7B where tipping beat all the other prompt variations we tested by several percentage points. This suggests that the impact of tipping might diminish with model size.

https://arxiv.org/pdf/2401.03729.pdf