Comment by brokensegue
16 hours ago
there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem
16 hours ago
there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem
What would you suggest instead?
A non-autoregressive transformer trained with a classification objective.
These are absurdly effective for this kind of task. Training is fast and straight forward. Packaging for deployment as ONNX is pretty simple as well.
1 reply →