Comment by krasi0

4 years ago

Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists

4 comments

krasi0

versteegen 4 years ago

Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.

Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.

[1] https://bellard.org/libnc/

[2] https://bellard.org/libnc/gpt2tc.html

[3] https://textsynth.com/

[4] http://www.mattmahoney.net/dc/text.html#1085

krasi0 4 years ago
Yet another gem by the genius Fabrice B!
I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)
- versteegen 4 years ago
  
  Yes, TextSynth.com is a commercial service, see pricing [1]. If his code is faster than others' (I'd certainly believe it) then it's quite valuable, and he deserves to be able to monetise it. Edit: Also, OpenAI is slashing price for GPT-3 by 2-3x tomorrow because of "progress in making our models more efficient to run" [2].
  Also, he was/is competing for the Hutter Prize with nncp, however he is outside the requirements for the prize: CPU-time, RAM, but most especially that submissions shouldn't require a modern CPU (with AVX-2) or a GPU. Otherwise he could have won it. I suspect it's actually that's the biggest reason he implemented libnc without GPU support initially. He has asked for the rules to be changed to allow AVX2 and I believe they eventually will be. So he won't give away the source for nncp yet, but will have to open source it to receive the prize.
  [1] https://textsynth.com/pricing.html
  [2] https://help.openai.com/en/articles/6485334-openai-api-prici...

infinityio 4 years ago

I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have

On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci

Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out

[0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b