← Back to context

Comment by cs702

6 years ago

...by the one and only Fabrice Bellard: "gpt2tc is a small program using the GPT-2 language model to complete and compress (English) texts. It has no external dependency, requires no GPU and is quite fast...The compression ratios are much higher than conventional compressors at the expense of speed and of a much larger decompressor. See the documentation to get results on text files from well known compression data sets."

A natural question I've pondered from time to time is whether Fabrice is really a time traveler from a more advanced civilization in the future, sent back in time to show us, mere mortals, what humankind will be capable of in the future.

If this sounds far-fetched, consider that he has created FFMPEG, QEMU, LibBF, SoftFP, BPG, TinyEMU, a software implementation of 4G/LTE, a PC emulator in Javascript, the TCC compiler, TinyGL, LZEXE, and a tiny program for computing the biggest known prime number.

And that's just a partial list of his successful projects, which now of course also include software for lossless compression with Transformer neural networks.

Any of these projects, on its own, would be considered a notable achievement for an ordinary human being.

Source: https://news.ycombinator.com/item?id=19591308 -- I never cease to be amazed by the guy.

This particular project is noteworthy mostly for its completeness and 'it just works' functionality. Tens of researchers before him have used arithmetic coding on the outputs of various neural network models to do lossless compression of text or images.

Bellards contributions are a packaged tool (as opposed to PoC code) and demo webpage, and the idea of using CJK characters rather than outputting binary data (in todays world of JSON, binary data has fallen out of fashion).