Comment by brokensegue

16 hours ago

there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem

4 comments

brokensegue

Reply

throwa356262 14 hours ago

What would you suggest instead?

stephantul 11 hours ago
A non-autoregressive transformer trained with a classification objective.
- all2 3 hours ago
  
  These are absurdly effective for this kind of task. Training is fast and straight forward. Packaging for deployment as ONNX is pretty simple as well.
  
  1 reply →