Comment by bayindirh

5 months ago

Maybe you can passthrough the completed text from a simple, fast grammar model to improve text. You don't need a 40B/200GB A100 demanding language model to fix these mistakes. It's absurdly wasteful in every sense.

I'm sure there can be models which can be accelerated on last gen CPUs AI accelerators and fix these kinds of mistakes faster than real time, and I'm sure Microsoft Word is already doing it for some languages, for quite some time.

Heck, even Apple has on-device models which can autocomplete words now, and even though its context window or completion size are not that big, it allows me to jump ahead with a simple tap to tab.

3 comments

bayindirh

mistercow 5 months ago

I wonder if this is a case where you want an encoder-decoder model. It seems very much like a translation task, only one where training data is embarrassingly easy to synthesize by just grabbing sentences from a corpus and occasionally swapping, inserting, and deleting characters.

In terms of attention masking, it seems like you want the input to be unmasked, since the input is fixed for a given “translation”, and then for the output tokens to use causally masked self attention plus cross attention with the input.

I wonder if you could get away with a much smaller network this way because you’re not pointlessly masking input attention for a performance benefit that doesn’t matter.

bayindirh 5 months ago

When I was reading your comment, I remembered an assignment in Fuzzy Logic course:
"Number of different letters" is a great heuristic for a word guesser. In that method you just tell the number of letters and then do some educated guesses to start from a semi-converged point (I think word frequencies is an easy way), and brute force your way from there, and the whole process finds the words in mere milliseconds.
You can improve this method to a 2-3 word window since we don't care about the grammar, but misread words, and brute-force it from there.
You may even need no network to fix these kinds of misrecognitions with this. Add some SSE/AVX magic for faster processing and you have a potential winner in your hands.

mdp2021 5 months ago

> Maybe you can passthrough the completed text from a simple, fast grammar model to improve text

Yes - but not really a "grammar model": a statistical model about text, with "transformer's attention" - the core of LLMs - should be it: something that identifies if the fed text has statistical anomalies (which the glitches are).

Unfortunately, small chatbot LLMs do not follow instructions ("check the following text"), they just invent stories, and I am not aware of a specialized model that can be fed text for anomalies. Some spoke about a BERT variant - which still does not have great accuracy, I understood.

It is a relatively small problem that probably does not have a specialized solution yet. Already a simple input-output box that worked like: "Evaluate statistical probability of each token" - then we would check the spikes of anomaly. (For clarity: this is not plain spellchecking, as we want to identify anomalies in context.)

Edit: a check I have just done with an engine I had not yet used for the purpose shows a number of solutions... But none a good specific tool, I am afraid.