Comment by bugglebeetle
18 hours ago
The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.
No comments yet
Contribute on Hacker News ↗